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■^ ■ Abstract 

^^ ■ The paper studies the problem of recovering a spectrally sparse object from a small number of time 

C^ ' domain samples. Specifically, the object of interest with ambient dimension n is assumed to be a mixture 

of r complex sinusoids, while the underlying frequencies can assume any continuous values in the unit disk. 
Conventional compressed sensing paradigms suffer from the basis mismatch issue when imposing a discrete 
dictionary on the Fourier representation. To address this problem, we develop a novel nonparametric 
jyj , algorithm, called Enhanced Matrix Completion (EMaC), based on structured matrix completion. The 

^ ' algorithm starts by arranging the data into a low-rank enhanced form with multi-fold Hankel structure 

whose rank is upper bounded by r, and then attempts recovery via nuclear norm minimization. Under 
mild incoherence conditions, EMaC allows perfect recovery as soon as the number of samples exceeds the 
order of r log'^ n, and is robust against bounded noise. Even if a constant portion of samples are corrupted 
VQ , with arbitrary magnitude, EMaC can still allow accurate recovery if the number of samples exceeds the 

^\j ' order of r log^ n. Along the way, our results demonstrate that accurate completion of a low-rank multi- 

fold Hankel matrix is possible when the number of observed entries is proportional to the information 
OO ■ theoretical limits (except for a logarithmic gap) under mild conditions. The performance of our algorithm 

l' ' and its applicability to super resolution are further demonstrated by numerical experiments. 
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1 Introduction 

1.1 Motivation and Contributions 



» j ^ A large class of practical applications features high-dimensional objects that can be modeled or approximated 

C^ ■ by a superposition of spikes in the spectral (resp. time) domain, and involves estimation of the object from 

its time (resp. frequency) domain samples. A partial list includes acceleration of medical imaging [T], 
target localization in radar and sonar systems [5], inverse scattering in seismic imaging [3], fluorescence 
microscopy [1], channel estimation in wireless communications 0(5], etc. The data acquisition devices, 
however, are often limited by hardware and physical constraints (e.g. the diffraction limit in an optical 
system), precluding sampling with desired resolution. It is thus of paramount interest to reduce the sensing 
complexity while retaining recovery resolution. 

Fortunately, in many instances, it is possible to recover an object even when the number of samples is far 
below the ambient dimension, provided that the object has a parsimonious representation in the transform 
domain. In particular, recent advances in Compressed Sensing (CS) [TUS] popularize nonparametric methods 
based on convex surrogates. Such tractable methods do not require prior information on the model order, 
and are often robust against noise and outliers [SlIlOj. 
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Nevertheless, the success of CS rehes on sparse representation or approximation of the object of interest 
in a finite discrete dictionary, while the true parameters in many applications are actually specified in a 
continuous dictionary. For concreteness, consider an object x (t) that is a weighted sum of iiT-dimensional 
sinusoids at r distinct frequencies {/^ € [0, 1]^ : 1 < i < r}. Conventional CS paradigms operate under the 
assumptions that these frequencies lie on a pre-determined grid on the unit disk. However, cautions need 
to be taken when imposing a discrete dictionary on continuous frequencies, since nature never poses the 
frequencies on the pre-determined grid, no matter how fine the grid is [TTIIT^]. This issue, known as basis 
mismatch between the true frequencies and the discretized grid, results in loss of sparsity due to spectral 
leakage along the Dirichlet kernel, and hence degeneration in the performance of CS algorithms. While one 
might impose finer gridding to mitigate this weakness, this approach often leads to numerical instability 
and high correlation between dictionary elements, which significantly weakens the advantage of these CS 
approaches |13| . 

In this paper, we consider the spectral com,pressed sensing problem, which aims to recover a spectrally 
sparse object from a small set of time domain samples. The underlying (possibly multi- dimensional) fre- 
quencies can assume any value on the unit disk, and need to be recovered with infinite precision. We develop 
a nonparametric algorithm, called Enhanced Matrix Completion (EMaC), based on structured matrix com- 
pletion. Specifically, EMaC starts by converting the data samples into an enhanced matrix with i^-fold 
Hankel structures whose rank is upper bounded by r, and then solves a nuclear norm minimization program 
to complete the enhanced matrix. We show that, under mild incoherence conditions, EMaC admits exact 
recovery from ©(rlog n) random sampleq^, where r and n denote respectively the spectral sparsity and 
the ambient dimension. Our results further demonstrate that under the same mild incoherence conditions, 
EMaC is robust against bounded noise and sparse corruptions. In particular, the robust version of EMaC 
admits exact recovery from 0{r log n) random samples even when a constant proportion of the samples are 
corrupted with arbitrary magnitudes. Furthermore, numerical experiments validate our theoretical findings, 
and show that EMaC is also applicable to the problem of super resolution. 

Additionally, we provide theoretical guarantee for low-rank Hankel matrix completion, which is of great 
importance in control, natural language processing, and computer vision. To the best of our knowledge, our 
results provide the first theoretical bounds that are close to the information theoretic limit. 

1.2 Connection to Prior Work 

Spectral compressed sensing is closely related to harmonic retrieval, which seeks to extract the underlying 
frequencies of an object from a collection of its time domain samples. Conventional methods for harmonic 
retrieval include the Prony's method [Mj, ESPRIT [T5], the matrix pencil method [TB], the Tufts and Ku- 
maresan approach jT7] . etc. These methods are typically based on the eigenvalue decomposition of covariance 
matrices constructed from equi-spaced samples, which can accommodate infinite frequency precision in the 
absence of noise. The K-io\d Hankel structure, which plays a central role in the EMaC algorithm, roots from 
the traditional spectral estimation literature under the name "matrix enhancement matrix pencil" |18j for 
estimating multi-dimensional frequencies. One weakness of these techniques lies in that they are parametric 
and require prior knowledge on the model order, that is, the number of underlying frequency spikes of the 
signal or, at least, an estimate of it. Besides, their performance largely depends on the the knowledge of 
noise spectra - these methods are often sensitive to noise and unstable against outliers [12]. The finite 
rate of innovation approach f20, 21J, while not restricting itself to equi-spaced samples, also requires prior 
knowledge on the model order and may not be easily extend to handle signals that are approximately sparse 
or vulnerable to outliers. 

Nonparametric algorithms based on convex optimization differ from the above parametric techniques in 
that the model order does not need to be specified a priori. For instance, the CS paradigms [TjiS] assert 
that it is possible to recover a spectrally sparse signal from a small number of time domain samples. These 
algorithms admit faithful recovery even when the samples are contaminated by bounded noise [9^,'22j or 
arbitrary sparse outliers [lOj. By assuming the frequency spikes lie on an a priori grid, rather than identify 
the locations of frequencies, CS selects them from a pre-determined set. However, there is an inevitable 
basis mismatch between the true frequencies and the discrete grid, no matter how fine the grid is [111123] . A 
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degeneration in performance of CS algorithms is reported due to loss of sparsity via spectral leakage along 
the Dirichlet kernel. 

Recently, Candes and Fernandez- Granda [24j proposed a total- variation norm minimization algorithm to 
super-resolve a sparse object from frequency samples at the low end of the spectrum. This algorithm allows 
accurate super-resolution when the point sources are appropriately separated, and is stable against noise f25]. 
Inspired by this approach. Tang et. al. [13\ then developed an atomic norm minimization algorithm [26J for 
line spectral estimation from C(r log r log n) random time domain samples, which achieves exact recovery 
under mild separation conditions. However, these works are limited to one-dimensional (1-D) frequency 
models, require that the complex signs of the frequency spikes are i.i.d. drawn from uniform distribution, 
and their robust against sparse outliers has not been established. 

In contrast, our approach can accommodate multi-dimensional frequencies, and only assumes randomness 
in the observation basis. The algorithm is inspired by recent advances of Matrix Completion (MC), which 
aims at recovering a low-rank matrix from partial entries. It has been shown [27l[28] that exact recovery 
is possible via nuclear norm minimization, as soon as the number of observed entries is on the order of the 
information theoretic limit. This line of algorithms is also robust against noise and outliers |291l30j . and have 
found numerous applications in collaborative filtering [2Il[31j, medical imaging [33j, etc. Nevertheless, the 
theoretical guarantees of these algorithms do not apply to the more structured observation models associated 
with the proposed multi-fold Hankcl structure. Consequently, direct application of existing MC results yields 
pessimistic bounds on the required number of samples, which is far beyond the degrees of freedom underlying 
the sparse object. 

1.3 Organization 

The rest of the paper is organized as follows. The data and sample models are described in Section [5J By 
restricting our attention to two-dimensional (2-D) frequency models, we present the enhanced matrix and the 
associated matrix completion algorithms. The extension to high-dimensional frequency models is discussed 
in Section 12.61 The main theoretical guarantees are summarized in Sections [3l based on the incoherence 
conditions introduced in Section l3. II We then discuss the extension to low-rank Hankel matrix completion 
in Section m Section [5] presents the numerical validation of our algorithms. The proofs for Theorem [T] and 
Theorem [3] are based on duality analysis and a golfing scheme ^E\ , which are supplied in Section [5] and 
Section [71 respectively. Section |5] concludes the paper with a short summary of our findings, a discussion 
of potential extensions and improvements. Finally, proofs of auxiliary lemmas supporting our results are 
provided in the Appendix. 

2 Model and Algorithm 

Assume that the object of interest x (t) can be modeled as a weighted sum of iT-dimensional sinusoids at r 
distinct frequencies f ^ G [0, 1]^ (1 < J < r), i.e. 



K*) = E^' 



gi27r(t,/,) 



where d^'s denote the complex amplitude, and (•, •) denotes the inner product. Here, it is assumed that 
the frequencies /^'s are normalized with respect to the Nyquist frequency of x{t). For concreteness, our 
discussion is mainly devoted to a 2-D frequency model. This subsumes line spectral estimation as a special 
case, and indicates how to address multi-dimensional models. The extension to higher dimension scenarios 
is discussed in Section [ 



2.1 2-D Frequency Model 

Consider a data matrix X — [Xk^i], < k < ni,0 < I < n2 oi ambient dimension rii x n,2, which can be 
obtained by sampling x{t) on a uniform grid at the Nyquist rate. Each entry X^j can be expressed as 

r 

Xkj=Y.d,y'yzl (1) 



where for any i (1 < « < r), we have 

yi = exp {j2Trfu) and z.^ = cxp (j27r/2j) , 
for some frequency pairs {(/li, /2j) | 1 < i < J'}- We can then express X in a matrix form as follows 



X = YDZ' 



where the above matrices are defined as 
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This is sometimes referred to as the Vandemonde decomposition of X. 

Suppose that there exists a location set Jl of size m such that Xj. ; is observed if and only if (k, I) e fl. It 
is assume that J7 is selected uniformly at random. We are interested in recovering X from the observation 
of its entries on a small location set ft. 



2.2 Matrix Enhancement 

One might naturally attempt recovery by applying the low-rank MC algorithms |27| . arguing that when r is 
small, perfect recovery of X is possible from partial measurements since X is low rank if r <C min{ni,r7,2}- 
Specifically, this corresponds to the following algorithm: 



M||, 
subject to Mkj = Xkj, V(fc,Z) e 51 



minimize 



(6) 



where ||Af||^ denotes the nuclear norm (or sum of all singular values) of M = [Mkj]- This is a convex 
relaxation paradigm of the rank minimization problem. However, naive MC algorithms |28j require at least 
the order of r max (ni, 71,2) log (niri2) samples, which far exceeds the degrees of freedom (which is O (r)) in 
our problem. What is worse, when r > min (ni, 71,2) (which is possible since r can be as large as 711^2), X is 
no longer low-rank. This motivates us to construct other low-rank forms that better capture the harmonic 
structure. 

In this paper, we adopt one effective enhanced form of X based on two-fold Hankel structure as follows. 
The enhanced matrix X^ with respect to X is defined as a fci x (ni — ki + 1) block Hankel matrix 
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where 1 < ki < rii, and each block is a fc2 x {122 — ^2 + 1) Hankel matrix defined such that for every £ 
{0<i<ni): 

Xi.n Xi,i ■ ■ ■ Xi,n^^k2 

Xl.l Xi,2 ■ ■ ■ Xi,n2~k2 + 1 
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(8) 



where 1 < ^2 < 71,2. This enhanced form allows us to express each block aqj 



Xe — Zi^Y^DZji, 



(9) 



where Zj^, Z^ and Y,^ are defined as 
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^R := 



and 

Yd -^ diag[2/i,y2, 

Substituting ^ into ([7]) yields the following: 
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(10) 



^(rii-A;i + l)(n2-fe2+l)-ER 



where £^l and E^ characterize the column and row space of Xc, respectively. This immediately implies 
that Xo is low-rank, i.e., 

rank(Xe)<r. (fl) 

This form aligns with the traditional matrix pencil approach proposed in |16[|18| to estimate harmonic 
frequencies if all entries of X are available. Thus, one can extract all underlying frequencies of X using 
methods proposed in [T5], as long as X can be faithfully recovered. 

2.3 The EMaC Algorithm without Noise 

Define 'Pn{X) as the orthogonal projection of X onto the subspace of matrices that vanish outside il. We 
then attempt recovery through the following Enhancement Matrix Completion (EMaC) algorithm: 



(EMaC) minimize ||Me||^ 

JV/-gCniXn2 

subject to Vn (M) = Vn (X) , 



(12) 



where M^. denotes the enhanced form of M. In other words, EMaC minimizes the nuclear norm of the 
enhanced form over the constraint set. This convex program can be solved using off-the-shelf semidefinite 
program solvers in a tractable manner (see, e.g., |34|). 



^Note that the Ith row X;, of .X" can be expressed as 

Xi,= [y[,--- ,y\\DZ= [y{di,--- ,y\.dr]z, 
and hence we only need to find the Vandemonde decomposition for Xq and then replace di by y\di. 



2.4 The Noisy-EMaC Algorithm with Bounded Noise 

In practice, measurements are almost always contaminated by a certain amount of noise. To make our model 
and algorithm more practically applicable, we can replace our measurements by X° — [-''^fe;]o<fe<ni,o</<n2 
through the following noisy model 

y{k,l)en: Xli^Xk.i + Nk.i, (13) 

where X^^ is the observed (A;,Z)-th entry, and N ~ [Nk^i\o<k<nifi<i<n2 denotes some unknown noise. We 
assume that the noise magnitude is bounded by a known amount ||7'r2(Ar)||p < (5, where ||-||p denotes 
the Frobenius norm. In order to adapt our algorithm to such noisy measurements, one wishes that small 
perturbation in the measurements should result in small change in the estimate. Our algorithm is then 
modified as follows 

(Noisy-EMaC) : minimize ||Mo|L (14) 

JVfgCi.lXi.2 

subject to ||-Po(M-X°)||p <(5. 

That said, the algorithm searches for a candidate with minimum nuclear norm among all signals close to the 
measurements. 

2.5 The Robust-EMaC Algorithm with Sparse Outliers 

An outlier is a data sample that can deviate arbitrarily from the true data point. Practical data samples 
one collects may contain a certain portion of outliers due to abnormal behavior of data acquisition devices 
such as amplifier saturation, sensor failures, and malicious attacks. A desired recovery algorithm should be 
able to automatically identify the set of outliers even when they arise in up to a constant portion of all data 
samples. 

Specifically, suppose that our measurements X° are given by 

V(fc,Oef^: Xli^Xkj+Sk,u (15) 

where X^ ^ is the observed (fc, l)-i\i entry, and S = [Sk.i\o<k<ni.o<i<7i2 denotes the outliers, which is assumed 
to be a sparse matrix supported on some location set fi'^"'*'' C Vt. We assume that the sparsity pattern of S 
is selected uniformly at random conditioned on 51. More specifically, we assume the following model: 

1. Suppose that 17 is obtained by sampling m entries uniformly at random, and define p = " . 



2. Conditioning on (fc,?) G O, the events {(fc, /) e il^"'*^} are independent with conditional probability 

¥{{k,l)cn'^''^^ I (fc,/)eO} =s 
for some small constant corruption fraction < s < 1. 

3. Define 17^''^'*" := il\51'^"'*y as the location set of uncorrupted measurements. 
EMaC is then modified as follows to accommodate sparse outliers: 

(Robust-EMaC) minimize ||Me||, + A||5e|li (16) 



subject to Vn [M + S^j = Vn {X + S) , 

where Me and Se denote the enhanced form of M and S, respectively. Here, HS'elJi := |jvec(S'e)|ji de- 
notes the ^i-norm of the vectorized Sc- The objective function of Robust-EMaC is the convex envelope of 
rank(Ai'c) + A||S'o||o, which captures the low-rankness of the enhanced form as well as the sparsity of the 
outliers. 



2.6 Extension to Higher-Dimensional Frequency Models 

The EMaC method extends to higher dimensional frequency models without difficulty. In fact, for K- 
dimensional frequency models, one can arrange the original data into a K-io\d Hankel matrix of rank at 
most r. For instance, consider a 3-D model such that 

r 



1 



An enhanced form can be defined as a 3-fold Hankel matrix such that 



X. 



Xo,c Xl,c • • • Xn3-k3,c 

^^3 — 1,0 ^ki.c '■' ^na — l.c 



where Xi^c denotes the 2-D enhanced form of the matrix consisting of all entries Xi-^j^^i^ obeying ^3 — i. 
One can verify that X^ is of rank at most r, and can thereby apply EMaC on the 3-D enhanced form. To 
summarize, for iiT-dimensional frequency models, EMaC (resp. Noisy-EMaC, Robust-EMaC) searches over 
all X-fold Hankel matrices that are faithful with the measurements. 

2.7 Notations 

Before continuing, we introduce a few notations that will be used throughout. Let the singular value 
decomposition (SVD) of Xc be Xc — UAV* . Denote by 

T := \UM* + MV* : M e £(ni-ki+l){n2-ki + l)xr. j^ ^ ^feifesxrj 

the tangent space with respect to X,,, and T^ the orthogonal complement of T. Denote by Vjj (resp. Vv, 
Vt) the orthogonal projections onto the column (resp. row, tangent) space of X^., i.e. for any M 

VuM = UU*M; VvM = MVV*; Vt =Vu +Vv - VuVy 

We let Vt^ = ^ — Vt be the orthogonal complement of Vt, where I denotes the identity operator. 

Denote by HAfJI, IJAfHp, ||A<f||^ the spectral norm (operator norm), Frobenius norm, and nuclear norm 
of M. Also, IITWII-^ and ||7W||^ are defined to be the £1 and ^00 norm of the vectorized M. Additionally, 
we use sgn (Af ) to denote the elementwise complex sign of M. 

On the other hand, we denote by fl(.{k,l) the set of locations of the enhanced matrix Xc containing 
copies of Xk^i- Due to the Hankel and multi-fold Hankel structures, one can easily verify the following: for 
any flc{k, I), there exists at most one index in any given row of the enhanced form, and at most one index 
in any given column. For each {k,l) G [rii] x [71,2], we use Ajj,;) to denote a basis matrix that extracts the 
average of all entries in ilo {k,l). Specifically, 

f^ , if (a,/3) e f)e(fc,0, 

(^(M))„,,- f^^ / ' (17) 

[0, else. 

We will use uj^^i '■— \^e [k, l)\ as a short hand notation. 

3 Main Results 

Encouragingly, under certain incoherence conditions, the simple EMaC enables faithful recovery of the true 
data matrix from a small number of noiseless time-domain samples, even when the samples are contaminated 
by bounded noise and a constant portion of arbitrary outliers. 



3.1 Incoherence Measures 

In general, matrix completion from a few entries is hopeless unless the underlying structure is uncorrelated 
with the observation basis. This inspires us to introduce certain incoherence measures. Let Gl and Gr be 
two r y< r correlation matrices such that 



if ii = 12, 

I ^-{y^^y^^) i-K^-2) iiu^i^ 

(Gr),i,,, := <( («i-'=i+i)("2-fc2+i) i-y*,y.2 i-<i^.2 ' " 'i ^ '2, 

1, if «! = i2- 

Denote the smallest singular values of Gl and Gr as CTmin (Gl) and cTmin (Gr). Note that Gl and Gr can 
be obtained by sampling the 2-D Dirichlet kernel, which is frequently considered in Fourier analysis. Our 
incoherence measure is then defined as follows. 

Definition 1. [Incoherence] Let X^ denote the enhanced matrix associated with X , and suppose the SVD 
ofXc is given by X^ = U A.V* . Then X is said to have incoherence (tJ-i-, fJ'2-: f-s) tf they are respectively the 
smallest values obeying 

CTmin (Gl) > , CTmin (G_r) > ; (18) 

Ml Ml 



max — 5— 

(fc,i)G[ni]x[n2] U}-^i 



^ (c/v*)„,, 



(a,l3)ena(kU) 



< S; (19) 



ntn: 



l"'2 



V(fc,Oe[ni]xH: V ^„,^|(t/[/*A(fc,i)VV*,A(„,^))|'<-^^Wfc,i. (20) 

^-^ 'f I \ \ . / \ .i^j/ I nin2 

(Q,,3)e[ni]x[n2] 

There is a large class of spectrally sparse 2-D signals that exhibit good incoherence properties. In this 
paper, we will always choose ki — O (ni), k2 — O (^2), ni — ki = {ni), and ^2 — ^2 = 6 {ni). To give the 
reader a flavor for the incoherence conditions, some interpretations are in order as well as a few examples. 
In this subsection, we restrict our attention to square matrices, i.e. n :— ni — n2- Note that the class of 
incoherent signals are far beyond the ones discussed below. 

3.1.1 Condition (jTS]) 

This specifies certain incoherence among the locations of frequency pairs, which does not coincide with and 
is not subsumed by the separation condition required in |!13II24J . The frequencies satisfying Condition ([T5)) 
can be either spread out or minimally separated. Two examples are listed as follows. 

• Random frequency locations: suppose that the r frequencies are generated uniformly at random, then 
the minimum pairwise separation can be crudely bounded by 9 ( ^^ ^^ j . If n ^ r^-^ log n, then a 
crude bound yields 

"^^^^^' "^^n^ '-y^^y^^ '^ i-<^'^ J ^' 

indicating that the off-diagonal entries of Gl and Gr are much smaller than \/r in magnitude. Simple 
manipulation then allows us to conclude that CTmin (Gl) and CT,nin (Gr) are bounded below by positive 
constants. 



Small perturbation off the grid: suppose that all frequencies are within a distance at most — yji from 
some grid points (^, ^) (0 < h, I2 < n). One can verify that 

V^17^^2, niax<^-- -i— ,- j-— )< 



ki l-yly^2 ' k2 l-z^z,^ j 2y/¥' 

and hence the magnitude of all off-diagonal entries of Gl and Gr are no larger than l/(4r). This 
immediately reveals that amin (Gl) and cr„iin (Gr) are lower bounded by 3/4. 

3.1.2 Condition (fTQ]) 

This can be satisfied when the total energy of each skew diagonal of UV* is proportional to the dimension 
of this skew diagonal. This is weaker than the one introduced in |27j for matrix completion, which requires 
uniform energy distribution over all entries of UV* . In general, the matrix UV* relies jointly on the 
frequencies and their complex amplitudes. For instance, an ideal /i2 may arises when the complex phases of 
all frequencies are generated in some random fashion. 

3.1.3 Condition (f20l) 

This is an incoherence measure based on the (A'-fold) Hankel structure, which depends on the energy 
distribution between Ai^n and the column and row spaces C/, V. The following manipulation may allow 
the reader to get a flavor for when this incoherence condition may hold. 
Let a,b G [rii] x [712]. One can easily verify that 

\{Ab,UU*AaVV*)\<^^ max {UU*Aa,VV*)„, 

(Q,0)6Oc(b) 

which, through simple manipulation, leads to 

V \{UU*At,VV*,Aa.)f<uJb V max {UU*Aa.VV*)^/. 

oG[ni]x[n2] ae[ni]x[n2] 

Note that Wa \\UU* AaVV*\\l < \\UU*\\l = r. If the energy of UU* AaVV* is spread out, i.e. most 
entries of ujaUU* AaVV* on ^c{i>) are bounded by O ( -^ I , then one can bound 



^ uJa\{UU*AbVV*,A^)\^< u:t, Y. ^(^)='^^^vn 

aG[ni]x[n2] aG[ni]x[n2] 



r 



In this situation, we have a good incoherence measure ^,^. 
3.1.4 Relations among Conditions (HID, ([Tol) and ([201 



Although it is not easy to examine Conditions (|T9|) and ([201) directly through the properties of frequency 
pairs, we can bound //2 and /is using the knowledge of /ii. Their mutual relations are summarized in the 
following lemma, which implies that /i2 = O (/if r) and /i3 = O i^i\r^ . 

Lemma 1. Define 

fnin2 nin2 



max 



\ fcifc2 ' (rii - fci + 1) (n2 - fc2 + 1) ^ 
Then the incoherence measures (/ii,/i2, /is) of Xc satisfy 

A*2 < A^iCgr, and ^3 < filclr. 

Proof. See Lemma [2] in Section |6l D 



3.2 Theoretical Guarantees 

With the above incoherence measures, the main theoretical guarantees are suppUed in the fohowing three 
theorems each accounting for a distinct data model: (1) noiseless measurements, (2) measurements contam- 
inated by bounded noise, and (3) measurements corrupted by a constant portion of arbitrary outliers. 

3.2.1 Exact Recovery from Noiseless Measurements 

The theorem below shows that exact recovery is possible under mild incoherence conditions. 

Theorem 1. Let X he a data matrix with matrix form (0), and f2 the random, location set of size m. If 
all measurements are noiseless, then there exists a constant ci > such that under either of the following 
conditions: 

i) (strong incoherence condition) hlf^) . Iil9\) and V2(A) hold and 

m > ci max (^iCs, ^^2, MaCs) r log^ (ninj) ; (21) 

a) (weak incoherence condition) M^) holds and 

m > cifilc^r^ log^(nin2); (22) 

X is the unique solution of EMaC with probability exceeding 1 — (nin2) 

Note that the result under conditions ii) is an immediate consequence of that under conditions i) 
by Lemma [TJ Theorem [1] states the following: (1) under strong incoherence condition (i.e. given that 
(/xi, /LJ2, Ma) are all constants), prefect recovery is possible as soon as the number of measurements exceeds 
0(r log (nin2)); (2) under weak incoherence condition (i.e. given only that /ii is a constant), exact recovery 
is possible from O (r^ log (711^2)) samples. Since there are at least 0(r) degrees of freedom in total, the 
lower bound should be no smaller than 8(r) |f|. This establishes the orderwise optimality of EMaC under 
strong incoherence condition except for a logarithmic gap. 

We would like to emphasize that while we assume random observation models, the conditions imposed 
on the data model are deterministic. This is different from [13], where randomness are assumed for both the 
observation model and the data model. 

3.2.2 Stable Recovery in the Presence of Bounded Noise 

Our method enables stable recovery even when the time domain samples are noisy copies of the true data. 
Here, we say the recovery is stable if the solution of Noisy-EMaC is "close" to the ground truth. To this end, 
we establish the following theorem, which is a counterpart of Theorem [1] in the noisy setting. 

Theorem 2. Suppose X° is a noisy copy of X that satisfies WViiiX — X°)||f < S. Under the conditions of 
Theorem]^ the solution to Noisy-EMaC in p^ satisfies 

SV^njr 



\\X,-X,\\F<.>2y^Ti:^ + 8nin2 + — '-^ } S (23) 

I m I 

with probability exceeding 1 — (77,1^12)"^. 

Proof. See Appendix |J] D 

Theorem [5] basically implies that the recovered enhanced matrix (which contains 0(71^^2) entries) is 
close to the true enhanced matrix at high signal-to-noise ratio. In particular, the average entry inaccuracy 
is bounded above by 0( "^"^ (5). We note that in practice, Noisy-EMaC usually yields much better estimate, 
possibly by a polynomial factor. The practical applicability will be illustrated in Section [5] through numerical 
examples. 



^In fact, the minimum number of measurements may better be thought of as (rlog (nin2)) rather than 0(r) due to a 
coupon collector's effect. 
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3.2.3 Robust Recovery in the Presence of Sparse Outliers 

The theoretical performance of Robust-EMaC is summarized in the following theorem. 

Theorem 3. Let X be a data matrix with matrix form 0j, and U, a random location set of size m. Set 
A — , , and assume s is some small positive constant. Suppose that the complex sign of S is 

randomly generated such that ¥,sgn{S) = 0. Then there exist constants ci,C2 > such that under either of 
the following conditions: 

i) (strong incoherence condition) itJ^) . U9\) and V20\) hold and 

m > ci max (^ic^, /i2, MsCs) r log^ (?ii"2) , (24) 

ii) (weak incoherence condition) \18\) holds and 

m > ci^\c^r^ log {nin2), (25) 

then Robust-EMaC is exact, i.e. the minimizer {M,S\ satisfies M — X, with probability exceeding 1 — 
(nin2)~^. 

Theorem |3] specifies a candidate choice of regularization parameter A that allows orderwise optimality, 
which only depends on the size of Vt but is otherwise data-independent. In practice, however, A may better 
be selected via cross validation. 

Furthermore, Theorem [3] demonstrates the possibility of robust recovery under a constant proportion 
of sparse corruptions. It states the following: (1) under strong incoherence condition, robust recovery is 
possible as soon as the number of measurements exceeds O (rlog (^1712)); (2) under weak incoherence con- 
dition, robust recovery is possible from O (r^ log {nin2)) samples. Compared with Theoremfl] an additional 
C'(log(nin2)) factor occurs. We note, however, that these conditions can be refined via finer tuning of 
concentration of measure inequalities. 

4 Structured Matrix Completion 

One problem closely related to our method is completion of multi-fold Hankel matrices from a small number 
of entries. While each spectrally sparse signal can be mapped to a low-rank multi-fold Hankel matrix, it 
is not clear whether all multi-fold Hankel matrices of rank r can be written as the enhanced form of an 
object with spectral sparsity r. Therefore, one can think of recovery of multi-fold Hankel matrices as a 
more general problem than the spectral compressed sensing problem. Indeed, Hankel matrix completion 
has found numerous applications in system identification |35 p36j , natural language processing |37| , computer 
vision ^38j, magnetic resonance imaging |39) . etc. 

There has been several work concerning algorithms and numerical experiments for Hankel matrix com- 
pletions |35[[36lf40] . However, to the best of our knowledge, there has been little theoretical guarantee that 
addresses directly Hankel matrix completion. Our analysis framework can be straightforwardly adapted to 
the general K-io\d Hankel matrix completions. Notice that /X2 and /is are defined using the SVD of Xe in 
(fT9l) and ((20|) . and we only need to modify the definition of /ii, as stated in the following theorem. 

Theorem 4. Consider a K-fold Hankel matrix X^ of rank r. The bounds in Theorems\^ andO continue 
to hold, if the incoherence fj,i is defined as the smallest number that satisfies 

an^rfx^ A\\UU*A^,,^\\UA^,,^VVrF} < ^- (26) 

Proof See Appendix |K] D 



Condition ([26| requires that the left and right singular vectors are sufficiently uncorrelated with the 
observation basis. In fact, condition (|26p is a much weaker assumption than psp . 

It is worth mentioning that low-rank Hankel matrices can often be converted to low-rank Toeplitz coun- 
terparts. Both Hankel and Toeplitz matrices are important forms that capture the underlying harmonic 
structures. Our results and analysis framework extend to low-rank Toeplitz matrix completion problem 
without difficulty. 

11 



5 Numerical Experiments 

In this section, we present numerical examples to evaluate the performance of the EMaC algorithm and its 
variants under different scenarios. We further examine the application of EMaC in image super resolution. 
Finally, we propose an extension of singular value thresholding (SVT) developed by Cai et. al. [,41j that 
exploits the multi-fold Hankel structure to handle larger scale data sets. 



5.1 Dirichlet Kernel 

To better understand the incoherence condition set in the theorem, we define the two-dimensional Dirichlet 
kernel as 

1 /l _ p-j2irfci/i \ /l _ p-j2-!Tk2f2 ' 



kik2 



1 _ e-j27r/i 



1 — e^J'2^/2 



where /i,/2 € [—1/2, 1/2]. Fig.[T](a) shows the amplitude of /C(/i,/2) when k = ki = k2 = 6. The {ii,i2)-th. 
entry of the matrix Gl is then given as 

("LJjj J2 — '^vJ/ii 2^12 J ^il ^ ^12)- 

The values of the off-diagonal entries decay inverse proportionally to the separation between the frequencies 
according to the Dirichlet kernel. We numerically evaluated the minimum eigenvalue of Gl in Fig. [1] (b) 
for different k = 6, 36, 72 when the spikes are randomly generated and the number of spikes is given as the 
sparsity level. As we increase the dimension fc, the minimum eigenvalue of Gl is closer to one, verifying our 
theoretical argument in Section [3. II 




-0.3 -0.2 -0.1 0.1 0.2 0.3 
separation on x axis 




sparsity level 



(a) 



(b) 



Figure 1: (a) The two-dimensional Dirichlet kernel when k = ki = k2 = G] (b) The empirical distribution of 
the minimum eigenvalue (TrnmiGh) for different k with respect to the sparsity level. 



5.2 Phase Transition in the Noiseless Setting 

To evaluate the practical ability of the EMaC algorithm, we conducted a series of numerical experiments 
to examine the phase transition for exact recovery. A square enhanced form was adopted with ni = n2, 
which corresponds to the smallest Cg. For each {r,m) pair, 100 Monte Carlo trials were conducted. We 
generated a spectrally sparse data matrix X by randomly generating r frequency spikes in [0,1] x [0,1], 
and sampled a subset D, of size m entries uniformly at random. The EMaC algorithm was conducted using 
the convex programming modeling software CVX with the interior-point solver SDPT3 [12] ■ Each trial is 
declared successful if the normalized mean squared error (NMSE) satisfies jjX — X||f/||X||f < lO^'^, where 
X denotes the estimate obtained through EMaC. The empirical success rate is calculated by averaging over 
100 Monte Carlo trials. 



12 



Fig. [5] illustrates the results of these Monte Carlo experiments when the dimensions of X are 11 x 11 
and 15 x 15. The horizontal axis corresponds to the number m of samples revealed to the algorithm, while 
the vertical axis corresponds to the spectral sparsity level r. The empirical success rate is reflected by the 
color of each cell. It can be seen from the plot that the number of samples m grows approximately linearly 
with respect to the spectral sparsity r, and that the slopes of the phase transition lines for two cases are 
approximately the same. These observation are in line with our theoretical guarantee in Theorem [T] This 
phase transition diagram validates the practical applicability of our algorithm in the noiseless setting. 







^^^^^^^^^^^^m i 



- - 0. 



m: number of samples 

(a) 



m: number of samples 

(b) 



Figure 2: Phase transition plots where frequency locations are randomly generated. The plot (a) concerns 
the case where ni — n2 = 11, whereas the plot (b) corresponds to the situation where ni — n2 = 15. The 
empirical success rate is calculated by averaging over 100 Monte Carlo trials. 



5.3 Stable Recovery from Noisy Data 

Fig. [3] further examines the stability of the proposed algorithm by performing Noisy- EMaC with respect to 
different parameter S on a noise- free dataset of ?' = 4 complex sinusoids with rii ~ n2 — 11. The number 
of random samples is m = 50. The reconstructed NMSE grows approximately linear with respect to S, 
validating the stability of the proposed algorithm. 




Figure 3: The reconstruction NMSE with respect to S for a dataset with rii = n2 = 11, r = 4 and m — 50. 
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5.4 Robust Line Spectrum Estimation 

Consider the problem of line spectrum estimation, where the time domain measurements are contaminated by 
a constant portion of outliers. We conducted a series of Monte Carlo trials to illustrate the phase transition 
for perfect recovery of the ground truth. The true data X is assumed to be a 125-dimensional vector, where 
the locations of the underlying frequencies are randomly generated. The simulations were carried out again 
using CVX with SDPT3. 

Fig. Hla) shows the phase transition for robust line spectrum estimation when 10% of the entries are 
corrupted, which showcases the tradeoff between the number m of measurements and the recoverable spectral 
sparsity level r. One can see from the plot that m is approximately linear in r on the phase transition curve 
even when 10% of the measurements are corrupted, which validates our finding in Theorem |31 Fig. Sfb) 
illustrates the success rate of exact recovery when we obtain samples for all entry locations. This plot 
illustrates the tradeoff between the spectral sparsity level and the number of outliers when all entries of 
the corrupted X° are observed. It can be seen that there is a large region where exact recovery can be 
guaranteed, demonstrating the power of our algorithms in the presence of sparse outliers. 
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Figure 4: Robust line spectrum estimation where mode locations are randomly generated: (a) Phase tran- 
sition plots when n = 125, and 10% of the entries are corrupted; the empirical success rate is calculated 
by averaging over 100 Monte Carlo trials, (b) Phase transition plots when n = 125, and all the entries are 
observed; the empirical success rate is calculated by averaging over 20 Monte Carlo trials. 



5.5 Synthetic Super Resolution 

The proposed EMaC algorithm works beyond the random observation model in Theorem [T] Fig. [S] considers 
a synthetic super resolution example motivated by [23] , where the ground truth in Fig. [SJa) contains 6 point 
sources with constant amplitude. The low-resolution observation in Fig. Eljb) is obtained by measuring low- 
frequency components [— /lo, /lo] of the ground truth. Due to the large width of the associated point-spread 
function, both the locations and amplitudes of the point sources are distorted in the low-resolution image. 

We apply EMaC to extrapolate high-frequency components up to [— /hi,/hi], where /hi//io = 2. The 
reconstruction in Fig. EJc) is obtained via applying directly inverse Fourier transform of the spectrum to 
avoid parameter estimation such as the number of modes. The resolution is greatly enhanced from Fig.[5l^b), 
suggesting that EMaC is a promising approach for super resolution tasks. 

5.6 Singular Value Thresholding for EMaC 

The above Monte Carlo experiments were conducted using the advanced semidefinite programming solver 
SDPT3. This and many other popular solvers (e.g. SeDuMi) are based on interior point methods, which 
are typically inapplicable to large-scale data. In fact. SDPT3 fails to handle an n x n data matrix when n 
exceeds 19, which corresponds to a 100 x 100 enhanced matrix. 



14 






(a) Ground truth 



(b) Low-resolution observation (c) High-resolution reconstruction 



Figure 5: A synthetic super resolution example, where the observation (b) is taken from the low- frequency 
components of the ground truth in (a), and the reconstruction (c) is done via inverse Fourier transform of 
the extrapolated high-frequency components. 

One alternative for large-scale data is the first-order algorithms tailored for matrix completion prob- 
lems, e.g. the singular value thresholding (SVT) algorithm [41j. We propose a modified SVT algorithm in 
Algorithm [1] to exploit the Hankel structure. 



Algorithm 1 Singular Value Thresholding for EMaC. 
Input: The observed data matrix X° on the location set Vt. 

initialize: let X° denote the enhanced form of Vq, {X°); set Mq — X" and t = 0. 
repeat 

1) Q, ^ 2?,, (Mt) 

2) Mt ^ Hxo (Qt) 

3) t ^ i -M 
until convergence 

output X as the data matrix with enhanced form Mt ■ 



In particular, two operators are defined as follows: 

• 'Drt{-) in Algorithm [T] denotes the singular value shrinkage operator. Specifically, if the SVD of X is 
given by X = IfSV* with S = diag {{ai}), then 

VrAX):^Udiag{{{a,-rt)^})V*, 

where tj > is the soft-thresholding level. 

• In the iiT-dimensional frequency model, T-Lx°{Qt) denotes the projection of Q^ onto the subspace of 
enhanced matrices (i.e. iiT-fold Hankel matrices) that are consistent with the observed entries. 

Consequently, at each iteration, a pair (Q^^Mt) is produced by first performing singular value shrinkage 
and then projecting the outcome onto the space of K-io\d Hankel matrices that are consistent with observed 
entries. 

Fig. [5] illustrates the performance of Algorithm[TJ We generated a true 101 x 101 data matrix X through a 
superposition of 30 random complex sinusoids, and revealed 5.8% of the total entries (i.e. m = 600) uniformly 
at random. The noise was i.i.d. Gaussian giving a signal-to-noise amplitude ratio of 10. The reconstructed 
vectorized signal is superimposed on the ground truth in Fig. [5] The normalized reconstruction error was 



X-X 



I ||X||p = 0.1098, validating the stability of our algorithm in the presence of noise. 
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Figure 6: The performance of SVT for Noisy-EMaC for a 101 x 101 data matrix that contains 30 random 
frequency spikes. 5.8% of all entries {m — 600) are observed with signal-to-noise amplitude ratio 10. Here, 
Tt = O.ltTmax (-^t) / [i^] empirically. For concreteness, the reconstructed data against the true data for the 
first 100 time instances (after vectorization) are plotted. 

6 Proof of Theorem [1] 

EMaC has similar spirit as the well-known matrix completion algorithms [2711^ except that we impose Han- 
kel and multi-fold Hankel structures on the matrices. While [28] has presented a general sufficient condition 
for exact recovery (see f^j Theorem 3]), the basis in our case does not exhibit a good coherence property as 
required in [28J, and hence these results cannot yield useful estimates in our framework. Nevertheless, the 
beautiful golfing scheme introduced in [55] lays the foundation of our analysis in the sequel. 

For concreteness, the analysis in this paper focuses on recovering harmonically sparse signals as stated in 
Theorem [TJ since proving Theorem [Tj is slightly more involved than proving Theorem U) We note, however, 
that our analysis already entails all reasoning required for Theorem |4l In this section, we restrict our 
attention to the real case (i.e. X is real-valued) for simplicitjQ. 

Before proceeding to the proof, we would first like to stress that the incoherence measure {fxi, fj,2, ^3) 
are not independent. In addition to (/ii,/i2,M3), we define another measure ^4 as the smallest number that 
satisfies 

Vb e [m] X [712] : Y. ^" \{VTAt,Aa.)f < -^ujt,, (27) 

aG[ni] X [712] 

Some of their mutual connections are listed as follows. 



nin2 



Lemma 2. Suppose that Xg has incoherence (fii, fi2, IJ-3, IJ-4). We have the following. 

1. Gl = EIEl, and Gr = {ErE*^/ ; 

2. For any a, 6 e [ni] x [712], one has 

fuJbSfXiCsr 




3. The incoherence measure satisfies 



and 



{At„VTAa)\ < 



t^2 < M?c^^, H3 < tJ-lclr, 



M4 < 9/iiC^r; 
4- The measure /X4 can be bounded by fii and ^3 as follows 

M4 < 6^1 Cs -I- 3/i3Cs 



(28) 

(29) 
(30) 



*A11 of the proof arguments in this paper work for Herniitian structured matrices (e.g. Hermitian Toeplitz matrices), and 
apply also to non-Hermitian complex case with some minor adjustment in the constants (see, e.g., 1281 Section III.D]). 
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Proof. See Appendix [Xj D 

Note that the above lemma indicates that our new incoherence measure fi^ can be bounded by the sum 
of fii and fi3 up to some multiphcative constant. In fact, we will prove instead the following theorem based 
on (/ii, /i2, /i4), which is slightly more general than Theorem [TJ 

Theorem 5. Suppose that X has incoherence measure (pi, (12^ fJ-s^ fJ-i)- If 

m > comax{fiiCs, ^i2, fJ-i) rlog^ (nifi2) , 

then X is the unique solution of EMaC with probability exceeding 1 — (71,171,2)^^. 

Note that Theorem [T] can be delivered as an immediate consequence of Theorem [5] by exploiting the 
relations among (/ii, /i2, fj.3, fii) given in Lemma [2l 

6.1 Dual Certification 

Denote by Athn (M) the projection of M onto the subspace spanned by Aj/j n , and define the projection 
operator onto the space spanned by all ^(fc,i) and its orthogonal complement as 

A:= Yl -^(fc.O' aiid A^=I-A. (31) 

{k,l)e[ni]x[n2] 

Here, {.4^ (-^)} spans a [kik2 (ni — fci + 1) (n2 ~ ^2 + 1) — nin2] dimensional subspace. 

There are two common ways to describe the randomness of 51: one corresponds to sampling without re- 
placement, and another concerns sampling with replacement (i.e. J7 contains m indices {a^ G [ni] x [712] : 1 < J < rn} 
that are i.i.d. generated). As discussed in [551 Section II. A], while both situations result in the same order- 
wide bounds, the latter situation admits simpler analysis due to independence. Therefore, we will assume 
that r2 is a multiset (possibly with repeated elements) and a^'s are independently and uniformly distributed 
throughout the proofs of this paper, and define the associated operators as 

m 

An:=J2-^'^'- (32) 



We also define another projection operator A'^^ similar to ()32p . but with the sum extending only over distinct 
samples. Its complement operator is defined as -4^^^ := A — A'^^. Note that An (M) = is equivalent to 

With these definitions, EMaC can be rewritten as the following general matrix completion problem: 

minimize IIAflL (33) 

M * 

subject to A'n (M) = A'n (X,) , 

A^ (M) = A^ (Xe) = 0. 

To prove exact recovery of convex optimization, it suffices to produce an appropriate dual certificate, as 
stated in the following lemma. 

Lemma 3. For a location set 57 that contains m random indices. Suppose that the sampling operator An 
obeys 



VtAVt - -^VrAnVr 



< \- (34) 



// there exists a matrix W that obeys 



and 



A'n^ {UV* + W)=0, (35) 

\\Vt^ {W)\\ < i (37) 



2 

Then X ,, is the unique optimizer of Ii33\) or, equivalently, X is the unique minimizer of EMaC. 
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Proof. See Appendix |B] D 

Condition ([M|) will be analyzed in Section 16.21 while a valid certificate W will be constructed in Section 
These are the objectives of the remaining part of the section. 



6.2 Deviation of \\VtAPt - ^VtAqPtW 

Lemma [3] requires that An is sufficiently incoherent with respect to T. The following lemma quantifies the 
projection of each Aik.i) onto the tangent space T. 

Lemma 4. Suppose that ()18ll holds, then 

\\TTTT*A W^ <^ BlSlH II 4 T^T^*I|2 ^ MlCsr II , x||2 2^iiCsr 

" ^ '"^ nin2 ^ nin2 ''/m* riin2 

/or a?Z (fc, G ["-i] ^ ["2]- 
Proof. See Appendix [Cj D 

As long as (|38|) holds, the deviation of VtAuVt can be bounded reasonably well in the following lemma. 
This establishes Condition (p4|) required by Lemma [3l 

Lemma 5. Suppose that 

II . N||2 2^1 c,r 

II iij^ nin2 

for (fc, I) G [rii] X [712]. T/ien /or any small constant 5 <2, one has 

nin2 



m 



-VrAnVT - VtAVt 



< 6 (39) 



with probability exceeding 1 — 2nin2 exp I — ^^ ™ ^ ) . 

Proof. See Appendix |D] D 

The above two lemmas taken collectively lead to the following fact: for any given constant e < e^^ < 

h W^'PTAnVT-VTAVTW < e holds v^' ^^ ^-'- ^-- ^ '- ~ ^"^ 

Ci/LtiCgr log (nin2) for some constant ci > 0. 



2' II m ■^T-4n'PT — 'Pt-47't II < e holds with probability exceeding 1 — (nin2) , provided that m > 



6.3 Construction of Dual Certificate 

Now we are in a position to construct the dual certificate, for which we will employ the golfing scheme 
introduced in |28| . Suppose that we generate jq independent random location multisets ^i {1 < i < jo), 
each containing ™ i.i.d. samples. This way the distribution of J7 is the same as ili U ri2 U • • • U iljg . Note 
that f2i's correspond to sampling with replacement. Let p :— " and q := -?- denote the undersampling 
factors of 51 and 51^, respectively. 

Consider a small constant e < -, and choose jo •= 31ogi nin2. The construction of the dual then 
proceeds as follows: 

Construction of a dual certificate W via the golfing scheme. 

1. Set Bo = 0, and jo := 31ogi(nin2). 

2. For a\U{l<i< jo), let B, = B, , + (i^^. + A^) Vt {UV* - B,_i) . 

3. Set W — -{UV* -Bjg). 

We will establish that W is a valid dual certificate if we can show that W satisfies the conditions stated 
in Lemma [3l which we will verify step by step. 
First, by construction, we have the identities 



{A, + A^){B,) = B 
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^^ 



for all 1 < i < jo- Since UV* + W = Bj„, this validates that A'^^_ {UV* + W) = 0, as required in (|55)l . 
Secondly, if one defines the deviation of VrBi from UV* as 



F,:=UV*-B„ 



and hence W = Fj^ , then one can verify that 

Vt {Fi) = Vt (UV*) - Vt (b,-i + (-An, + aA Vt iUV* - B,_i) 

= (vt - Vt (-An, + aA Vt) (F,-i) • 

Lemma [5] asserts the following: if qnin2 > ci/iiCsrlog (nin2) or, equivalently, m > cifXiCgrlog {nin2), then 
with overwhelming probability one has 



Vt-Vt{ -An^ +A^)Vt 



VtAVt - -VrAnJ^T 
q 



< e< 



This allows us to bound WVt (-F'OIIf ^^ follows 

\\Vt (F.)IIp < e^ \\Vt (Fo)IIp < e^ \\UV*\\^ = e^V^, 
which immediately validates Condition p6p : 

W-Pt (W^)IIp = \\Vt (F,J||p < .^«^^ < ^. 

Finally, it remains to show that \\Vt^ (^)II ^ i- For any F £ T, define the following homogeneity 
measure 



^(^)^,. n^^^r , (Ak.D^F) , (40) 

which largely relies on the average per-entry energy in each skew diagonal. We would like to show that 
ly ((X — Vt ( -Ant + A^^ ) ) -^j — i^ (-^) with high probability. This is supplied in the following lemma. 

Lemma 6. Consider any given F d T, and suppose that (|18p and (I27p hold. If the following bound holds, 

m> cj max {/14, /^iCs} r log^ (nin2) , 

then one has 

// /I . \ \ \ 1 

(41) 

-3 



i,[[Vt-Vt{ -An, + aA Vtj Fj < ^u (F) 



for all 1 < i < jo with probability exceeding 1 — (ni?i2) 

Proof. See Appendix IeI D 

This lemma basically indicates that a homogeneous F with respect to the observation basis typically 

results in a homogeneous ( Vt — Vt ( -Ani + A'^ J Vt ) (F) , and hence we can hope that the homogeneity 

condition ([T^ of Fq can carry over to every Vt (Fi) (1 < i < jo). 
Observe that Condition P^ is equivalent to saying 



1 I |2 1 

viFo)— max (Aa. ;-,, C/V*) = max —^ 

{k,l)<£[ni]x[n2] UJk,r {k,l)e[ni]x[n2] UJ^^i 

One can then verify that for every i (0 < i < jo). 



ia,/3)enjk,i) 



< 



ti2r 



(nin2 



,2 ■ 



HVTiF.)) < iHVTiF..,)) < (i) HFo) < (i) ^ 
holds with high probability if m > C7max{/i4,/iiCs}rlog (nin2) for some constant cj > 0. 



The following lemma then relates the homogeneity measure with 



Vt^ {^An, + A^) (F,) 
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Lemma 7. For any given F € T such that ^{F). Then there exist positive constants cg and cg such that 
for any t < ^Jv{F)nin2, 



Vt^ ( -An, + A^ ) (F) 



>t 



holds with probability at most Cg exp ( tSt^ ) . 

Proof. See Appendix iFl 

Since v{Fi) < (j) -^r^ for all 1 < i < jo with high probability, then one can bound 



D 



sJv{Fi)nxn2 ^ /I 
2 



Lemma [7] immediately yields that for all i (0 < i < jo 



i+2 



Y{\ti 



Vt^ {-An + A^]iF,) 



<'^ 



> P<^Vi 



Vt^ ( -An + A^ ) (F, 



< 



^v{Fi)nin2 
VI 6^2?" 



> 1 — c%nin2 exp 



C9gnin2 \ 
16^2?' / 



> 1 - cs (711^2) , 
holds if qnin2 > C12 max (/iiCg, /i4, /i2) ?" fog ("i'^2) for some constant C12 > 0. This is also equivalent to 

TO > ci3 max (/xiCs, fi4„^2)r log^ (^1^2) 
for some constant C13 > 0. Under this condition, we can conclude 



Jo 



|^T-(W)||<^ 

JO ^, 



Vt^ [-An + A^]{F,) 



i+2 



1 

2/ <2- 



So far, we have successfully established that with high probability, 1^ is a valid dual certificate, and 
hence EMaC admits perfect reconstruction of X. 

7 Proof of Theorem [3] 

The algorithm Robust-EMaC has similar spirit as the well-known robust principal component analysis |10[ 
[29| that seeks a decomposition of low-rank plus sparse matrices, except that we impose multi-fold Hankel 
structures on both the low-rank and sparse matrices. Similar to the proof for Theorem [U the proof is based 
on duality analysis, and we rely on the golfing scheme introduced in [28j to construct a valid dual certificate. 
In this section, we prove the results for a slightly different sampling model as follows. 

• The location multiset $7'^'°'^" of observed uncorrupted entries is generated by sampling (1 — s) pnin2 
i.i.d. entries uniformly at random. 

• The location multiset fl of observed entries is generated by sampling pnin2 i.i.d. entries uniformly at 
random, with the first (1 — s) pnin2 entries coming from fj'^''^^'^. 

• The location set 17'^"^*^ of observed corrupted entries is given by Jl'\f7'^'''™ , where fl' and $7'^'''™ denote 
the sets of distinct entry locations in il and $7'^''''*", respectively. 
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As mentioned in the proof of Theorem [TJ this shghtly different samphng model, while resulting in the same 
order-wise bounds, significantly simplifies the analysis due to the independent assumptions. 

Similar to the proof for the noiseless setting, we will prove our results for the real case under the in- 
coherence measures (/xi, /i2, /X4) as introduced in Section [HI Moreover, we will prove the theorem under 
a stronger randomness condition, that is, the signs of all non-zero entries of S are independent zero-mean 
random variables. Specifically, we will prove the following theorem. 

Theorem 6. Suppose that X has incoherence measure {^i, ^2, t^s, IJ-4)> (md let A = . ^ Assume 

-^mlog(nin2) 

that s is some small positive constant, and that the signs of nonzero entries of S are independently generated 
with mean 0. If 

m > coma,x{fiiCs, fi2, fJ-i) rlog^ (r^l?^2) , 

then Robust-EMaC succeeds in recovering X with probability exceeding 1 — (n.iri2)"^. 

A simple derandomization argument introduced in [29, Section 2.2] immediately suggests that the perfor- 
mance of Robust-EMaC under the fixed-sign pattern is no worse than that under the random-sign pattern 
with sparsity parameter 2s. Therefore, we will focus on the random sign pattern, which are much easier to 
analyze. 

The same argument as in Section [S] indicates that Theorem [3] can be delivered as an immediate conse- 
quence of Theorem [51 

7.1 Dual Certification 

We adopt similar notations as in Section l6.ll That said, if we generate pnin2 i.i.d. entry locations a^'s 
uniformly at random, and let the multisets ft and O'^'''™ contain respectively {ai\l < i < pnin2] and 
{ai\l < i < p{l — s)nin2}), then 

pnin2 p{l — s)nin2 

An ■= ^ Aai, and Aq.i.^,-, := ^ Aa,, 

4 = 1 1=1 

corresponding to sampling with replacement. Besides, Aq (resp. y^^cicon) is defined similar to An (resp. 
Unclean), but wltfi thc suiu extending only over distinct samples. 

We will establish that exact recovery can be guaranteed, if we can produce a valid dual certificate as 
follows. 

Lemma 8. Suppose that s is some small positive constant. Suppose that the associated sampling operator 

y^Qclean OtcyS 



VtAVt ~ —r, rPr^ocioanT'T 

p(l-s) 



< \, (42) 



and 

Mn^i™ (M)||p < 101og(nin2) PU» (M)||p , (43) 

for any matrix M . If there exist a regularization parameter A (0 < A < 1) and a matrix W obeying 



\\Vt {W + Asgn (Se) - UV*)\\^ < ^ 
\\Vt^{W + Xsgn {SM<h 
A[n....,^ (W) ^ 0, 
[\\A'n...AW)\\^<l 
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(44) 



then Robust-EMaC is exact, i.e. the minimizer ( M , S 1 satisfies M = X . 

Proof. See Appendix [G] D 



We note that a good bound on 



Vt-APt (i-s) 'PtAqcici.ilVt can be found using Lemma[5] Specifically, 



there exists some constant ci > such that if p (1 — s) nin2 > ci/xiCgrlog (nin2), then one has 



p(l-s) 



1 
< - 

- 2 



with probability exceeding 1 — (nin2) . Besides, a simple Chernoff bound |43) indicates that with probability 
exceeding 1 — (71.1^2) , none of the entries is sampled more than 101og(nin2) times. Equivalently, 

P(VM : Po-- (M)llp < 101og(nin2) \\A'^..... (M)||p) > 1 - {n,n2r^. 

Our objective in the remaining part of the section is to produce a matrix W satisfying Condition (j44p . 

7.2 Construction of Dual Certificate 

Suppose that we generate jo independent random location multisets fJ^'"™, where il,f^^'^ contains qnin2 i.i.d. 



._ (i-^)p 



samples uniformly at random. Here, we set q :— - — ^-^. This way the distribution of the multiset il is the 

same as flf'"'"' U nf'"^'' U • • • U fl]]^'"''. 

We now propose constructing a dual certificate W as follows: 

Construction of a dual certificate W via the golfing scheme. 

1. Set Fo = Vt {UV* - Asgn (^c)), and jo := 51ogi nin2. 

2. For every t {1 < i < Jo), let F, := (VtAVt ~ iT'T^nf—^r) F^-l. 

3. Set W := jy;L, (^^^c.e.„ + A^) F,^,. 

Take A = , We will justify that W is a valid dual certificate, by examining the conditions in 

-^mlog(nin2) 

(|ii)) step by step. 

(1) The first condition requires WVt {W + Asgn(S'o) — C/V*)||p to be reasonably small. Lemma [S] asserts 
that there exist some constants ci,ci > such that if ?n = pnin2 > ci/xiCgrlog (71.1^2) or, equivalently, 
qinin2 > Ci/xiCgrlog (nin2), then 

\\VtFJ\^ < - rr-F.o-illp < • • • < (^7J \\VtF4^ 

<-^(||C/F*||p + A||sgn(5c)||F) (45) 

"■l"'2 

1 / ^ . X I , . . A 



< 



(Vr + Anin2)< g g (nin2 + Anin2) < ^ 2 ^^^) 



i^rvk ' ' nVn^ ' ' ntn5 



with probability exceeding 1 — (nin2) ^. 

It remains to relate Vt {'W + Asgn(5'e) — UV*) with VtFj^. By construction, 

Jo / -j^ X 

-Vt {W + Asgn {S,) - UV*) = VtFq - JI ^^ ( --^nf-- + -4^ J VtF.-i 

= T't-Fo - Vt ( -^njiean + ^^ j VtFo -J^'^t (-A^f^.. + aA F,_i 
^° / 1 A 



30 
Vt{ 

= VtF, 



30- 
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Plugging this into P^ establishes that 



\\Vt {W + Asgn (Se) - UV*)\\^ = \\VtF,, ||p < 



? 2 ■ 



(47) 



(2) The second component of the proof is to develop a bound on \\Vt^ (M^ + Asgn (Sc))|l, for which we 
proceed by bounding WVt^ (^)II ^nd ||Pt^ (-^sgn (S'e))|| separately. Recall that an inhomogenuity measure 
V [F] applied to a matrix F G T is defined in ([3D]). The following lemma develops upper bounds on v (Fi) 
for every i {0 < i < jo) ■ 



Lemma 9. //A 

for anyO<i < jo, 



I , then there exists constants cj, cg > such that ifm > cy max (/iiCg, ^4) rlog (nin2), 



v{Fi)< 



CQma.yi{piCs,fJ.2,fJ-4:)'' 
4'nfn| 



with probability exceeding 1 — n^ n. 
Proof. See Appendix IhI 



3„-3 



D 



Now we are ready to develop an upper bound on \\Vt^ (^)II- Observe from the construction procedure 
of W that 



IVt^ (W) 



1=1 

jo 

1=1 



E^T-(j-^nc 



A^ F,_i 



Vt± [ 7^0<?>-" +A^ ] Fr^l 



(48) 



Combining Lemma[7]and Lemma[5]with some simple manipulations yields the following: if ttt, > C13 max (/iiCg, /i2, /^4) r log (ni 
then one has 



Vz: 



Vt^ [ ^A^c 



A^ F,_i 



<'^ 



> 1 



cs 



''l"2 



Consequently, (pS)) can be further bounded by 

JO / 1 \ *+4 1 

I|Pt-(w^)I!<Eu <s 



i=0 



(49) 



with probability exceeding 1 §s^. 

We still need to bound WVx^ (^sgn (Sc))!!, which is supplied in the following lemma. 

Lemma 10. Consider A = . "'- , and suppose that s is a small constant. Ifm > cy max (/iiCg, ^4) r log (^17^2) , 

•^nilog(niri2) 

we can bound 



sgn(5'c)|| < \/c22/3snin2log2 (7x1^2) and \\Vt^ (Asgn(5o))|| < - 



with probability at least 1 — (ni?T.2) ^• 
Proof. See Appendix HI 

Putting (gni) and §U^ together yields 



(50) 



D 



IVt^ (T^ + Asgn(5e))|| < \\Vt^ (W^)|| + II^t^ (Asgn(5e))|| < 



with high probability. 

(3) By construction, .4' ,^^, .^ (W) — 0. 
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(4) The last step is to bound ||.4^oioQn (W)|| , which is apparently bounded above by H^Qdoan (VF)|| 
By construction, one can express 



|^0-a„(Ty)|| 



Jo 



-A 



A^ F,_i 



JO ^ 

rt i 



Jo 



< 



7 — max 

^ q (fc,0e[«i]x[«2] 

2 — 1 



/Ml > ^^ ^ /7 



i=l 



Jo 



51og(nin2) /max(^iCs,Ai2,/i4)r 



4'"infn2 



^E 

■'■^ J_ / 25max(/iiCs,A^2,A^4)?-log^ (»l?^2) 



?n^ 



< 



1 



A 



4 y 771 log {nin2) 



when m > C25 max {fJ-iCg, /^2j /^4) *" log ('^ifi2) for some constant C25 > 0. 

We have verified that W satisfies the four conditions required in (|44p , and is hence a valid dual certificate. 
This completes the proof. 

8 Concluding Remarks 

We present an efficient nonparametric algorithm to estimate a spectrally sparse object from its partial 
time-domain samples, which poses spectral compressed sensing as a low-rank Hankel structured matrix com- 
pletion problem. Under mild incoherence conditions, our algorithm enables recovery of the multi-dimensional 
unknown frequencies with infinite precision, which remedies the basis mismatch issue that arises in conven- 
tional CS paradigms. We have shown both theoretically and numerically that our algorithm is stable against 
bounded noise and a constant portion of arbitrary corruptions, and can be extended to tasks such as super 
resolution. To the best of our knowledge, our result on Hankel matrix completion is also the first theoretical 
guarantee that is close to the information-theoretical limit (up to a logarithmic factor). 

Our results are based on uniform random observation models. In particular, this paper considers directly 
taking a random subset of the time domain samples, it is also possible to take a random set of linear mixtures 
of the time domain samples, as in the renowned CS setting [7]. This again can be translated into taking 
linear measurements of the low-rank K-io\d Hankel matrix, given as y = B{Xc). Unfortunately, due to the 
Hankel structures, B does not satisfy the matrix restricted isometry property (RIP) |341I44| . Nonetheless, 
the technique developed in this paper can be extended without difficulty to analyze linear measurements, in 
a similar flavor of a golfing scheme developed for CS in [22]. 

It remains to be seen whether it is possible to obtain performance guarantees of the proposed EMaC 
algorithm similar to that in [24J for super resolution. It is also of great interest to develop efficient numerical 
methods to solve the EMaC algorithm in order to handle massive datasets. 



A Proof of Lemma [2] 

(1) We first show that E^^Ei^ and E^E'^ coincide with the matrices Gl and Gjj. Since Y^ is a diagonal 
matrix, one can verify the identities 



(y'^ziz^y'^ 



and 



fc2-l 



(^l^l).,,, - E «^ 



fc=0 



(y*i2/»2) (^L-^L),,,i3, 



1 — Z* Zi. 



if ii 7^12, 
if ii = 12, 
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which immediately give 



e;e 



L-C'L 



fcifc: 



l'C2 



ZlY*^Zl---,iY*^)''' 'Zl 



Zl 
ZlYa 



ZlY\^-^ 



fei~i 



k\ki 



J2 Y'iZlZ^Y. 



1=0 






i2<r 



l<ii ,22 5;^ 



with the convention that — .. '\ ^ — — fci and — . '^, ^ — = fco. That said, aU diaffonal entries satisfy 
(E'l-^l)^ i ^ I7 ^iid the magnitude of off-diagonal entries can be calculated as 



(ElE^k^ 



sin [nki (/h, - /hJ] sin [vrfca (/2ji - /2j2)] 



/ci sin [tt (/hi - /hJ] A:2 sin [tt (/2ii - /2j2)] 



Recall that this exactly coincides with the definition of Gl- Similarly, Gr = {EjiE'^) . These findings 
immediately yield 

CTmin (EIEl) > , and (Tmin (^^R^^R.) > • (51) 

Ml Ml 

(2) Consider the case in which we only know Cmin (Gl) > — and crmin(GR) > — . In fact, since 
\{AinVTAa)\ — {{VtAi,, Aa)\, we only need to examine the situation where Wf, < Ua- 
Observe that 

|(Ab,7'TA„)| < \{Ai„UU*Aa)\ + \{Ai„AaVV*)\ + \{Ab,UU*A^VV*)\ . 

Owing to the multi-fold Hankel structure of Aa- the matrix UU* y/uJ^Aa consists of Wa columns of UU* . 
Since there are only oob nonzero entries in At, each of magnitude — ^ , we can derive 



\{At, UU*A^)\ < \\Ah\\, \\UU*AJ^ = c^b • ^ • max (C/C/*A„)„ ^ 



< < / — max 

V ^a a,f} 



iuu*\ 



Denote by M^^ and M^* the fcth column and fcth row of M, respectively, then it can be observed that 
each entry of UU* is bounded in magnitude by 



(UU*). 



Eh (-El-El) E^ 



k.l 



{Eh),jElEh) '{{Eh) 



<\\{Eh),X\\{Eh)Jp {ElEhT' 



< 



< 



^iCsT 



klk2 fTmin (-^l-El) 711^2 ' 



(52) 



which immediately implies that 



Similarly, one can derive 



{Ai„UU*Aa)\< 



\{Ab,A^VV*)\< 



uja nin2 ' 



Ua nin2 ' 



(53) 



(54) 
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We still need to bound the magnitude of {UU* AaW* , Ab). One can observe that for any 1 < fc < ^1^2: 



<||(£;l),JIp {ElE^y'El 



< 



kik2 ^cr„,in {ElEi^) 



< 



nin2(T^in{ElEi,)' 

Similarly, for any 1 < I < (ni — fci + 1) (^2 — A:2 + 1), one has |l(V'V*)^;||p < 
magnitude of all entries of UU* Aa,VV* can now be bounded by 



nin2<y„tiJE'Ei^ 



The 



max 

k,l 



{UU*A^VV*)^, < ||AJ|max||(t/[/*),JLmax||(yy*), 



*/IIf 



< 



< 



1 



/w^ nin2Crmin (-^l^l) 
1 ^iCgr 



/UJa nin2 
Since Ah has only cuf, nonzero entries each has magnitude —^■, one can verify that 

1 



\{UU*A^VV*,Ab)\ < (max (UWA^VV*) 



k,l 



UJb 



LOb = 



UJb HiCsr 
UJa nin2 



(55) 



The above bounds ([5^ . ([M)) and (|55p taken together lead to 

\{Ab,VTAa)\ < \{UU*Aa,Ab)\ + \{AaVV*,Ab)\ + \{UU* AaVV* , Ab) 



< 



fuJ^3fiiCsr 

UJa 771712 



(56) 



(3) On the other hand, the bound on |(A5,7't^o)| immediately leads the following upper bounds on 

j:^\{UU*AaVV*,Ab)fua and j:^\{rTAb,Aa)\^LUa: 

Y, \{UU* AaVV*, Ab)f UJa 



< 



ae[rii]x[n2] 




aG[ni]x[n2] 


jujb fJ-iCsr 

V UJa 771712 


. f^lcy 





UJa = UJb 



E 



aS[ni]x[n2l 



771772 



771772 

which simply come from the inequality (|55|) . and 



Y, \{VTAb,Aa)fuJa 

ae[ni]x[n2] 



< 



E 



ujb3fiiCsr\ -s-^ /S^iCsT- 

] UJa = UJb } 

, V , --^ -71772 / .^, , V "l'^2 

ae[ni]x[n2\ oe[ni]x[n2] 



UJb- 



77,1772 



,2^2, 



which is an immediate consequence of (j56p . These bounds indicate that ^3 < ^J^c^r and /i4 < 9/iiC, 

We can also obtain an upper bound on ^2 through /ii as follows. Observe that there exists a unitary 
matrix B such that 

UV* = ^L (-E^^l)"^ S {E^E*^)-'^ En. 
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For any (fc, /) G [ni] x [^2], we can then bound 

[Ei^ (ElEi^)--^ B {EnE'^yi E^) ^ ^ 



iUV*)kj 



< ||(£;L)fcJlF \\{EIE^) ^1 \\B\\ \\{E'^ET,y^ \\{En 

I r I r 

- V M^^' V (ni - fci + 1) (n2 -k2 + l) 



^*(IIf 



nin2 



Since A/^.^ has only ojk^i nonzero entries each of magnitude "'" , this leads to 



^l.i 



E (uv*y 

(Q,/3)eOe(fe,/) 



— \{UV*,A^k,i))\ 



< < max 



<(^m^ax|([/F*),^, 



2 
I 



/i^Jkl 



■^kJ 



'2 '2 

"-l"-2 



which indicates that fi2 < Mi^s^- 

(4) Finally, we split Eae[ni]x[«2] [("^T^b, y/uJ^Aa)] as follows 



^ |(7'TAb,V^A„)|'= J2 \{{Vu + Vv-VuVv)Ab,y^A^ 



aG[ni]x[n2] ^^ ^^ ^ _j 

^3 Y. {\{VuAb,y^A^)\' + \{Vv 

a£ frill xfn2l 



ie[ni]x[ri2] 



a£[ni]x[n2] 

Now look at X^a K'^c/^b'V^^a)! ^Ha\{UU* ^b-,^/^Aa)\ . We know that 

\\UU*A,\\1<^^1^^ 
nin2 



'} 



and that UU* Ai, has Wb non-zero columns, or, 



1 



rT-TT-M A column permutation 

UU*Ab = -^^ 



Ub ,0 

.ojb columns 



(57) 



(58) 



and hence (UU* Ai,, yuJ^Aa) is simply the sum of all entries of UU* Ai, lying in the set Oo(a). Since there 
are at most ujb nonzero entries (due to the above structure of UU* Ab) in each sum, we can bound 



using 



\{UU*Ab,V^A^ 
the inequality {J2'i=i ^i) — ^bS^^i ^f • This then gives 



E iuu*Aby 



<u^b E YuU*Ab)^^ 

(Q,,3)eOe(a) 



^ \{UU*Ab,y^A^)\^<u:b Y. E \^UU*^b)c.ji 

rai]x[n2] ae[ni]x[n2] (Q,;3)Gno(a) 



aS[ni]x [n2 



< LJb\\UU*Ab\\l < UJb 



2 ^ _ MiCsr 
nin2 
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where the last inequahty follows from Lemma 21 Similarly, one has 

^-^ nin2 

aS[ni]x[n2] 

To summarize, 



Y, \{rTAt„v^Aa.)\^ 

oe[ni]x[n2] 
a£[ni]x[n2] 

~ 11x712 nin2 

B Proof of Lemma El 

Consider any valid perturbation H obeying Vq {X + H) = Vn (-X"), and denote by H^ the enhanced form of 
H . We note that the constraint requires A'^ {He) = (or An {He) — 0) and A^ {He) = 0. In addition, set 
Wq = Vti- {B) for any B that satisfies {B,Vt± {He)) = \\Vt± {He)\\^ and ||B|| < 1. Therefore, T^o € T^ 
and ||Tyo|| < 1- and hence UV* + Wq is a subgradient of the nuclear norm at Xe- We will establish this 
lemma by considering two scenarios separately. 
(1) Consider first the case in which Hg satisfies 

||PT(Ke)|lp<^||PT-(fl'c)|lF. (59) 

Since UV* + Wo is a subgradient of the nuclear norm at Xe, it follows that 

\\Xe + HeW, > IIXelL + {UV* + Wo, He) 

= \\Xe\\, + {UV* + W, He) + {Wo, He) - {W, He) 

= \\Xet + {{A'n + A^) {UV* + W) , He) + {Wo, He) - {W, He) (60) 

>\\Xet + \\VT^{He)t-{W,He) (61) 

where (pO)) holds from (|55|) . and (pT|) follows from the property of Wo and the fact that {A'q + A-^) {He) = 0. 
The last term of ((6T|) can be bounded as 

(W, He) = {Vt {W) , He) + {Vt^ {W) , He) 

< W-Pt (T^)IIp \\Vt {He)\\p + \\Vt^ {W)\\ \\Vt^ {He)t 



<^;;2^\\'PT{He)\\^ + -\\rr.{He)h, 



1 „„ ,,. .„ 1 

where the last inequality follows from the assumptions ([5^ and (|37l) . Plugging this into (ICTjl yields 



,,i^jVr{He)\\, + l 
1 „„ .,. .„ 1 



\Xe + He\i > \\Xe\l - ^^^^ \\Vt {He)^ + ^ H^T- (^^e)!! 



> \\Xe\i ~ - \\Vt^ {He)\\p + - \\Vt^ (Ifc)llp (62) 

>\\Xe\i + \\\VT^{He)\\p 

where (|5^ follows from the inequality IJiWH^ > ||M||p and ((5^ . Therefore, Xe is the minimizer of EMaC. 
We still need to prove the uniqueness of the minimizer. The inequality (p^ implies that ||Xe + He\\^ = 

\\Xet only when \\Vt^ {He)\\p = 0. If \\Vt^ {He)\\p - 0, then \\Vt {He)\\p < ^ \\Vt^ {He)\\p = 0, and 



28 



hence Vt^ {He) — Vt (He) = 0, which only occurs when He = 0. Hence, X^ is the unique minimizer in 
this situation. 

(2) On the other hand, consider the complement scenario where the following holds 



WVt [h 



c;iiF 



>I%t||7'^.(Jf,)||p. 



(63) 



We would first like to bound 11 i^^^^^An + A^) Vt (-H'e)IL and \\{'-^^^^An + A^) Vt± (-ffc)IL- The former 
term can be lower bounded by 

(!^An + A^)vT{He)' 
\ m / F 

= {{^A. + A-)Vr{H^),{^A. + A-)VriH^)) 

'""l^AnVr (He) , "^^^AnVr (i^c)) + {A^Vt {He),A^VT [H,)) 



> {Vt (He) , -^^AnVT (He)) + {Vt {He),A^VT (H,)) 
\ m / 

[Vt {He) , Vt (^A. + A^) Vt {H^)] 



(64) 



{Vt {He) , Vt {He)) + {Vt {He) , (^VtAuVt - VtAVt ) Vt {He 



>\\VT{He)n- 



VtAVt - ^^^^-^VtAuVt 
m 



> l- 



VtAVt ~ ^^^^^-^VtAuVt 



\Vt {H, 



\VT{He)r^ 
e)llF 



>\\\VT{He)\\l. 



(65) 



(66) 



On the other hand, since the operator norm of any projection operator is bounded above by f , one can 
verify that 



711712 



An + A-^ 



m 



< 



711^2 



771 



I Ai + -4-^ II + 5Z ll-^'^' II - '^i"^, 



where ai {1 < i < m) are 777 uniform random indices that form fi. This implies the following bound: 

^^i^ A + a) Pt^ (ff e) <nin2\\VT^{He)\\^<^^\\Vl 
7n / F 77i772 



c^llF ' 



(67) 



where the last inequality arises from our assumption. Combining this with the above two bounds yields 



= 



n^An + A^){He) > r-^An + A^)vT{He) - (^A. + A)7',.( 

\777 / fV777 / fV777 / 



{He 



>Jh\VT{H,)h-^-\\VT{He 
V 2 '^ ni772 

>l\\VT{He)\\^>'^\\VT^{He)\\^>0, 



F - 4 II' T^ ^-'-'e^llF 

which immediately indicates Vt-^ {He) = and Vt {He) = 0. Hence, (j63)) can only hold when He = 0. 



C Proof of Lemma S] 

By definition, we have the identities 



I^t(A(,,))||J = (7't(Am)).Am)) 



= {Vu {A(^k,i)) + Vv {A^k,i)) - VuVv {A(^kA)) , A(fe,o) 



\Vu {A^k,i))\\p + \\Vv {A^k„ 



\VuVv {A^k,i))\ 
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Since U (resp. V) and E]^ (resp. -Er) determine the same column (resp. row) space, we can write 

UU* = El, (ElEi,)-' El and VV* = E^, (^r-E^)"' -Er, 
and thus 

\\Vt (A(M))IIf ^ W'Pu (^(M))|If + 11^^ (^(M))|If 



< 



< 



1 



^{k,i)E'^{ERE'^) Er 
1 



fmin (-El-^l) 



CTmin (.-C'R-C/RJ 



Note that ^ujk,iElA(^ki^ consists oiuk.i cohimns of E^l (and hence it contains rujk^i nonzero entries in total). 
Owing to the fact that each entry of E^ has magnitude . "'" , one can derive 



< 



A similar argument yields 



117^*4 l|2 1 1 

^ OJk,i kik2 kik2 nin2 



\-^(k,i)El\\^ < —^— 



711^2 



niri2 



nin2 



We know from Lemma [5] that E^Ei, — Gl and EjiE'^ — G'[, and hence CTmin {E^Ei,) > — and 
CTmin (E^rEJJj^) > — . One can, therefore, conclude that for every (fc,?) G [rii] x [712], 

IIt. /^ N||2 ^MiCs?^ 11^ (, M|2 . A^iCs?- , ll-n //» x||2 2/iiCsr 

\\Vu[A^k,i))\\F<——. \\Vv[A(k,i))\\F<^;:^. and Pt (Af,,,)) ||p < 

D Proof of Lemma [5] 

Define a family of operators 

V(fc, e [ni] X [n2] ■■ Z(^k,i) 
We can also compute 



^-^VTAi^k,i)VT VtAVt- 

m m 



and hence 



VTA(k,i)VT (M) = Vt {{A(k.i),VTM) A(k.i)] = Vt {A^kd)) {Vt {A(k,i)) ,M) 



{VTA(kX)'PTf (M) = [PTA(k^i)VT {Vt (A^.o)}] (^^ (^(M)) --^) 



(69) 



= T't {(A(fe,j),7'T (A(fc,z))) A^^kJ)]{VT (A(fe^i)) ,M) 



Comparing (I55|) and ([70)) gives 



(^tA^.O^t)' = {A(k,i).VT {A^^kj)))VTA(k,i)VT < ^z^'^TA^k,i)VT, 



711712 



(70) 
(71) 



where the inequality follows from our assumption that 



{A^k,i^,VT{A^k,)))-\\VT{A^k,Ml<^^'''' 



71l7l2 



Let di {1 < i < m) be m independent random pairs uniformly chosen from [711] x [7^2], then we have 
E(Z„J = 0. This further gives 



EZI^eC^^^^VtA^^Vt 



E 



ni7i2 



771 



VTAa,VT 



"1"2 ] 



1 



'.{VTAa^VT) - ^{VtAVt) 



m^ 
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We can then bound the operator norm as 



E(Zf.) < 



'•1"-2t 



^VtA^^VtY 



[VtAVtY 



< 



2 ? 
"-l"2 



1 



{VTAa.VT) 

< ' ' ^ IE {VTAa^Vr) + -^ 

m^ nin2 m'^ 

= ''^"'^T"" ^\'PtAVt\\ + -, 



(72) 



(73) 



where (f72|) uses the fact that VrAaiVT ^ 0. Besides, the first equahty of ([71]) gives ||7^Tw4(;i;,/)pT|| < 

II II 2 II II II II II II 2 

||7'TA(fe_;)||p \\VTA(kA)'PT\\ and hence \\VTA(^kA)'PT\\ < ||pT^(fc,/)||p, which hnmediately yields 



\ZaA\ < 



711712 



711^2 



VtAo^VtW + — WVtAVtW < 
m ' m m 



\VtA. 



2,1^ 4/iiCsr 



a; llF 



< 

771 771 



This together with ([75| gives 



27nyo 



> 2. 



Applying the Operator Bernstein Inequality [28, Theorem 6] yields that for any t < 2, we have 



E^- 



> i < 2rti?i2 cxp — 



16^ 



Finally, one can observe that X^Tii ^at is equivalent to "'^^^^ VtAuVt — VtAVt in distribution, which 
completes the proof. 



E Proof of Lemma [6] 

Fix any b E [ni] x [712]. For any a e [711] x [712], define 

za = -^— {Ab, VtAF) - / Afc, -VtA^ {Aa, F) . 
qnin2 \ 9 / 

Then for any i.i.d. a^s chosen uniformly at random from [711] x [712], we can easily check that E {z^-) — 0. 
Define a multiset J7; := {a^ | 1 < i < 977177,2}, then the decomposition 

qni7i2 



allows us to derive 



An,F= J2 Ac,AAc.„F) 

i=l 

{At,,VTAn,F) = ( Ab, ^ VtA^^ (A,,,F) 



and thus 



gni"2 qnin2 , > 

Y^ z„, = {Ab^VrAF) ~ Yl {Ab,-VTA^\{A^,,F 



i=l 



{At^VrAF) - - {Ai„VTAn,F) 



Ab, ( VtAVt - -VtAu^Vt ] F 
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Owing to the fact that Ezq^ = 0, we can bound the variance of each term as fohows 



E \zc,f ^Yav (( Ab,-VTA^^ ){Ac.,,F] 



<E 



Ab,-VTA^^ )(^a.,-F) 



< 



ae[ni]x[n2] 

1 u{F) 



ni7i2 



Ab,-VTAa,){Aa.,F 



g^ 711^2 



J2 \{rTAh,Aa.)fu;a 



aE[ni]x[n2] 



tJ.iriy{F) 

^ 7 ^^b, 

\qn1n2) 

where the last inequahty arises from the definition of /i4, i.e. for every b £ [rii] x [7^2], 



ie[ni]x[n2] 



nin2 



-LOb- 



This immediately gives 



— E ( y^' |_^ |2 ] < tJ.4ri^{F) ^ max{/j4,3/ziCs}rt/(F) 
cjb \^ -^ °'^ j - qn^n^ - qnin2 



■=V. 



On the other hand, Lemma [2] shows the inequality 



{Ab,VTA^)\ < 



ijja nin2 



which further leads to 



Ab,-VTAa.){Aa.,F 



< V^al^{F)^^\{Ab,VTAa.) 

V^bq 

< Vhf)- 



q 711712 



Since ^ {Ab,VTAF) = E ( Ab, ^VtA^, ) (A„, , F) , one has as weh 



g71l7l2 



{Ab,VTAF) 



E{Ab,-VTAa){A^,F) 



q 711712 



which immediately leads to 



1,1 1 



1 



<yMF) 



qni?i2 
1 6/niCsr 

g 71l7t2 



(A5,7't^F) 






The above bounds indicate that 



2V 



>VHF)- 



(74) 



(75) 



Applying the operator Bernstein inequality [28l Theorem 6] yields for any t < i' (i^), 

2 



1 



E 



> t < cg exp 



tqnin2 



4max{/i4, 3^iCs} rv(F) 
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Thus, there are some constants 07,07 > such that whenever qnin2 > £7 max {/i4, 3/xiCs} r log (77,171,2) or, 
equivalently, m > C7inax{/i4, 3/iiCs}rlog (71177,2), we have 

> 7^(-^) ^ C6 exp - ,^_„.r.. o.. „ 1 „.yi:^N ^ 



y Wb ^ / ^ 16max{/i4,3/iiCs}ri^(F)/ (niri2) 

Finally, we observe that in distribution, 

// 1 \ \ |y-?"i"2^ |2 

z; 7't-47't - -VrAn.VT ]f] = max '^*=i "' ' 



q } } be[ni]x[n2] Wb 

Applying a simple union bound over all h E [ni] x [77,2] allows us to derive ((4T|) . 

F Proof of Lemma [7] 

For any a e [77,1] x [712], define 

Ha = -Vt^ (A„) (A„, F) + -^—Vt^A^ (F) . 

q 9771772 

Let ai {1 < i < 9771712) be independently and uniformly drawn from [tii] x [772] which forms O/. Observing 
that 

AF= J2 Aa{Aa,F), 
aG[ni]x[n2] 

we can write 

Vt^AF = J2 ^T^ (^") (^- -^) • 

ae[ni]x[n2] 

This immediately gives 

EH^^ = -^—Vt^A"^ (F) + -^— V Pj.^ (Aa.) {Aa,F) 

qniJl2 9771772 ^-^^ , 

= Pt^-4^ {F) + ^^T^T^-^ (F) 

9771712 9771772 

= —^Vti- [F) = 0. 

9771772 

Moreover, we have, in distribution, the following identity 

qnin2 

^4o,+^^j(F) = 

On the other hand, since EiJ^. — 0, if we denote y.i — -Vt^ {Aon) {AanF), then H^i = 3^,; — EJ^j, and 
hence 

EHo.,Hl^=E{{y,-Ey,){y,-Ey,)*}<Ey,y* = ^^ V \{Aa,F)\^rT^{Aa){rT^{Aa.))* . 

ae[ni\x[n2\ 



/-, \ ?nin2 

Pt^ [-An,+A^)iF)= J2 He,, 
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The definition of the spectral norm |j7W|j := max^.|[^j| ^i {ip,Mip) allows us to bound 



\E(H^^Hl)\\<^ max 



< 



< 



\^- Y. \{Aa,F)f{iP,VT^{A^){VT^(Aa)r^j)\ 
1 nin2 ^.^. , 

iy(F) max ( ih, T^ 

\ae[ni]x[n2] 



^aPT^ (^a) (Pt^ (^a))* U 



max {ip, ip) 

■!/>:||l/;||2=l 



< 



KF) 



where the last inequality uses the fact that ||Aa|| = — . Therefore, 



E Y. ^-.Hl 



< v[F)nin2- := V. 



Besides, the definition (HO)) of v{F) allows us to bound 









The fact that EH a, = yields 

1 



qnin2 



Vt^A^ [F] 



E-Vt^{A^,){Ao.,,F) 



<VHF)- 



and hence 



\H^A\ < 



qnin2 



-Vt±A^{F) 



-Vt^{A^^){Ao.„F) 

q 



< 



2yMF) 



Applying the Operator Bernstein inequality [5S1 Theorem 6] yields that for any t < ^J v{F)nin2, we have 

v^J'^i^a^ + a'-){f) >t 

with probability at most cg exp I t^ j for some positive constants cg and cg. 

G Proof of Lemma [5] 

Suppose there is a non-zero perturbation {H,T) such that {X + H, S + T) is the optimizer of Robust-EMaC. 
One can easily verify that V^-^ (S + T) — 0, otherwise we can always set S' + T as Vn (S + T) to yield a 
better estimate. This together with the fact that 'Pq± {S) — implies that Vn (T) = T. Observe that the 
constraints of Robust-EMaC indicate 



Vn {X + S) =Vn{X + H + S + T) 



Vn {H + T) = 0, 



which is equivalent to requiring A!q^ [He) = —A[^ (Te) — —T^ and A-^ (He) — 0. 

Recall that H^ and S'e are the enhanced forms of H and S, respectively. Set Wq E T^ to be a matrix 
satisfying {Wo,Vt^ {He)) — WVt^ {Hc)\\^, and ||Tyo|| < 1, then UV* + Wq is a subgradient of the nuclear 
norm at JCe. This gives 






{UV*+Wo,H,) 

{UV\H,) + \\Vt^{H,)\1. 



(76) 
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Owing to the fact that support {S) C $7'^"'*'', one has Sc = A'^^dMy (S'c)- Combining this and the fact that 
support {Sc + Tc) C ri yields 



U' 



Qclcan 



A' 



J-Jdirty 



which further gives 



|5e + Telli - ll^elli = P^c>_ (T,)||, + \\Sc+ ^^a... (T,)||, - \\Sc\\, 

> P^elcn (Te)lli + (sgn(Se) ,^[,d.t. (Te)) 

= P^cl.an (Te)lli - (sgn(Se) ,^^a.t. {H,)) 

= M'ncw (Te)ll, ~ (^^a... (sgn(Se)) ,i?c> 



IX 



n-i" 



(He 



(sgn(S'e),JTe) 



(77) 
(78) 

(79) 



Here, (|77|) follows from the fact that sgn(Se) is the subgradient of \\-\\^ at Se, and ([75)) arises from the 
identity 'Podirty (if + T) = and hence y^^dirty {He) = — -A^dirty (Tc)- The inequalities ([75|) and ([75)) taken 
collectively lead to 



All^e 



(||Xe|L+A||5e||i) 



ll^e + Jfc,,. 

> {UV*,He) + \\Vt^ (i/e)IL + A||X,.,„„ (Jfe)lli - A(sgn(5e) ,Ifc) 

> - (Asgn(5e) - UV*,He) + \\Vt± (ifc)IL + A ||X,.,_ (^c)|Ii ■ 



(80) 



It remains to show that the right-hand side of ([50)) cannot be negative. For a dual matrix W satisfying 
Conditions (|44)). one can derive 



{Xsgn{Se)-UV*,He) 
= {W + Asgn (Sc) - UV* , iif e) - (W, He) 
= {Vt {W + Asgn(Se) - UV*),Vt {H,)) + {Vt± {W + Asgn(5e) - UV*) ,Vt± {H^)) 

- (Xc:_ (W-) ,-4[,.,_ {He)) - (X(^.,_). (W-) ,-4;^,_). (ife)' 



(81) 



where the last inequality follows from the four properties of W in ()44p . Since (X + H, S + T) is assumed 
to be the optimizer, substituting ([5T]) into ([50)) then yields 



>I|X, 



-ffn 



A||Se+Te||i-(||Xe 



> - \\Vt^ {He 



>-\\VT^{He)l 



All^clli) 

A 



-A||X.,_(Jf,)||,~^3l|7'T(ffc)llp 
4 n^nj 

h ||X,.,_ (Jfe)llF - -^ \\Vt {He)\\^ , 



(82) 



(83) 



where ([83)) arises due to the inequality |lAi'||p < ||Af||-^. 

The invertibility condition ()42p on VrAQd^anVT is equivalent to 



Vt-Vt 



p{l-s) 



A: 



n<:l'=' 



A^]Vt 



1 
^2' 



indicating that 



1 



ll^T (-ff, 



c;iiF 



< 



Vt 



p{l-s) 



Ajcica., +^^7^7 {He) 



< 2 II^T (/f, 



o;iiF ■ 
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One can, therefore, bound \\Vt (-f^c)llF ^^ follows 

1 



\Vt{H,)\\^<2 



Vt 



p{l-s) 



An^,...+A-^]VT{H,) 



< 



< 



p{l-s) 
2 






< 



pil-s) 
+ 2 WVtA-^ (Ifc)llF + 2 WVtA^Vt^ (Ife)llF 

./_ . (Pa-_ (ffe)llF + PocicanT'T- (//e)|lF) + 2 HT't- (ffc)llF : 



where the last hiequality exploit the facts that A-^ (He) = and WVt (-M)IIf — II-^IIf- 

Recall that Aaci^,ia corresponds to sampling with replacement. Condition (P5| together with 
to 

WVt (-ffe)llF < ^°^^°fry^ (ll-^^^c.™ (Jfe)llF + \\A[,.,...Vt^ (H,)]]^) + 2 IIT't^ (/fe)|lF 



(84) 



leads 



< 



p(l-s) 
201og(nin2) ,, ., 



4,.,e.„(Jfe)|li 



p(l-s) 
201og(nin2) 



2 ||Pr^ (ffe 



where the last inequality follows from the fact that HA^'Hp < ||Af ||_^. Substituting (|55|) into ([55)1 yields 



(85) 



A ^201og(n,n2) , 2)) ||^,. (,j,)||^ + A ^^ 201og(n,n.) 



4 nfn^ V P(l-s) 



4 p{l~s)n{ni 



P[,.,_(/fe)|lF<0. (86) 



Since A < 1 and pn\n\ ^ log(nin2), both terms on the left-hand side of (|55)) are positive. This can only 
occur when 

Vti- (H,) = and ^[^cio»„ (H,) = 0. 



(87) 



(1) Consider first the situation where 

rT(ffe)||p 

One can immediately see that 



<!!M\\Vr.iH^)h. 



\\Vt (H 



cJllF 



<!!tl\\Vr.{H^)h = 



That said, Robust-EMaC succeeds in finding X^ under Condition 
(2) Consider instead the complement situation where 



||7'T(i/c)||F>^rT-(-H'e)|lF- 

Note that A'^^^i^^n {H c) = .4^(ifc) = and VtAVt (i-s) "^T-^fjcicanT^T < \- Using the same argument 

as in the proof of Lemma |3] (see the second part of Appendix IB|) with Q, replaced by Jl'^''^'*", we can conclude 
i?r = 0. 
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H Proof of Lemma [9] 

By definition of i/(-), we can bound 



V (Fo) = max | ( A(fe,,) , UV* - Wrsgn (S,)) \ 

(fcj)e[ni]x[n2] UJkl 



< max 



IX |J-|(A(,,;),C/V*)|' + — A2|(A(,,,),PT(sgn(Se)))rt 

i]x[n2] I^Wfe,/ l^fc,i J 



< ^^+ max — |(A(fe,),7'T(sgn(5c)))|', (89) 

where tlie last inequality follows from the definition of /i2 in (J19p . In order to develop a good bound on 
V (Fq), we still need to bound the component I (A(a:,;) , "Pt (sgn (S'c))) I • This is achieved through the following 
lemma. 

Lemma 11. Suppose that s is a positive constant. If p > C21 ™'^^t^^'^i'=°-''' log [nin2), one has 

1 I/. „ . /c\\\|2/ psmax(/Z4,/xiCs)rlog(nin2) 



max 



|(A(fc,i),7'T(sgn(Se)))| <C20- 



wii/i probability at least 1 — (nin2)"^. 

By setting A :— , , we can derive from Lemma [Til that 

Y"l"2plog(nin2) 

2 ,91/. ^ / /^NN\i2 max(u4,/iiCs) r 
inax A^ A(,^i),pT sgn 5e < C20 ^22 ^ 

(fc,i)e[ni]x[n2] WfcJ l\ \w /I j^z^z 

where we also use the assumption s < 1/2. This together with (I59|) leads to 

^1:1 ^ / C9max(/xiCs,Ai2,Ai4)r- 

^ [Fq) < 2"^ 

nfn2 

for some cg > 0. Applying Lemma [5] then shows that 

.(Fo4.(Fo)<^^i^^50^|iMZ: (90) 

4' 4*rifn2 

with high probability, as long as m > 07 max (/iiCg, /i4) r log (nin2) for some C7 > 0. 
Finally, the proof of Lemma fTTJ proceeds as follows. 

Proof. By definition, fi'^"'*'' is the set of distinct locations that appear in ft but not in 17'^'°'*". To simplify 
the analysis, we introduce an auxiliary multiset fi'^"''^ that contains psnin2 i.i.d. entries. Specifically, suppose 
that D, ^ {ai\l <i < pnin2}, fi'^'*'™ = {ai\l <i < p{l - s) nin2} and il'^"*^ = {a, | p (1 - s) nin2 < i < pnin2}, 
where a^'s are independently and uniformly selected from [ni] x [77,2]. 
In addition, we consider an equivalent model for sgn (S) as follows 

• Define K = i-^a,l3)i<a<n i<g<n ^° ^^ ^ random ni x 722 matrix such that all of its entries are 
independent and have amplitude 1 (i.e. in the real case, all entries are either 1 or -1, and in the 
complex case, all entries have amplitude 1 and arbitrary phase). We assume that KK = 0. 



Set sgn (S') such that sgn(S'a,^) — Ka,i3'i-{(a,i3)en'^"*y}^ ^-nd hence 

Sgn{Se)= ^ Ka,l3y/U!a,l3Aa,i3- 
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Recall that support (S^) C $7'^"'*''. Rather than directly studymg the property of sgii(Se), we will first 
examine the property of an auxiliary matrix 



pniri2 



i=p(l — s)nin2 + l 



and then bound the difference between S c and sgn {S c)- 

For any given pair (fc, /) G [ni] x [n2], define a random variable 



1q,^ 



ijJk.l 



{VrA^k.l), Ka.JjAa.p) 



Thus, 2^ai's are conditionally independent given K. The conditional mean and second moment of Za^ can 
be computed as 

E f Z„,. I K) = — L= {VTA(k,i) ,K,), 



and 



711712 ^UJkl 



1 1 






< 



be[ni]x[n2] 

max(/iiCs,/i4)r- _ 

2 2 • 

71^772 



F, 



where the last inequality follows from the definition of fi^. Besides, applying Lemma [5] allows us to bound 
the magnitude of Za^fj as follows 

\Z^,^\<^^:^B. (91) 



711772 



Applying the Bernstein inequality ^281 Theorem 6] yields that: for any t < |ps. 



E 2« 

L i— p(l — 6*)nin2 + l 



ps 



/UJk.l 



{VTAi^k,i),K,) 



> i < 27ii7i2 exp 



t' 



4/9S71l7l2V" 



2711 7i2 exp — 



4ps max(/iiCs,/i4)r 
nin2 



(92) 



Recall that 5 e := J2iZ'p{i~s)nin2+i ^a.^/i^Aa,. Conditional on K, -^ (A(^k.i),VT iS A ) is equiv- 
alent to '^i2:],7i~s)n n +1 -^o-i ^^ distribution. We also note that for any constant C20 > 0, there exists a 
constant C21 > such that if ps7ii7i2 > C21 max (/iiCg, p^) r log (711712), then one has 



'c2opsmax(/iiCs,Ai4)?'log(7ii7i2) ^ 2 

71l7l2 i 



Therefore, by setting t := y/ <=2opsra!,^(,.ic,^,j^i)r,og(mn2) jn ([gg)^ we can show that with probability exceeding 



/ C2ops max(/iiCs,;A4)r log(nin2) 

' V nin2 

I — (771712) ^, one has, when conditional on K, that 
1 



^k,l 



A(^k,l) ,rT[Sj)-ps (VTA^k.l) , ^e) 



2 ^ C20PS max iniCs, Pi) r log (711712) 

~ 71i7l2 



for every (fc, Z) S [711] x [712], provided that psnin2 > C21 max (/iiCg, /i4) t' log (711772) for some constants 
C20,C21 > 0. 
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The next step is to bound — == \VT-^{k,i): I^c/- For convenience of analysis, we represent as 



aG[ni]x [712] 



where z^'s are independent (not necessarily i.i.d.) zero-mean random variables satisfying \za\ — 1. Let 
3^0, :— "'" (VTAfknjZay/tJ^Aa), then the definition of /i4 and Lemma [2] allow us to compute 



\ya 



^ya = 0, 

^ '• ■ ' ^ ' ' 71171,2 



/^kA 



and 



E ^y<^y: = — E K^TA(.,,,v^^.)|^<=i^^i£iiMz::=f. 



ag[ni]x[n2] 



tOk.l 



aS[ni]x[n2] 



nin2 



Applying the Bernstein inequality [28, Theorem 6] suggests that for any ^ < |, 

1 



/uJkj: 



{VTAf^kj),K,) 



>t>< 2nin2 exp 



3' 
i2 



!c{fj.iCs,fi4}r 



Consequently, there exists a constant C24 > such that 

P^s^ 1/^ . r, \|2 ^ C24/5^s2 max {/iiCs,/i4}r log (77,1712) 

\\rTJ^(k,l},-l^c)\ < 

UJk,l 771772 

with high probability. This together with ([M)) suggests that 



1 



A 



(fe,0 



,'Pt{s,) 



E 2° 

i=p(l — s)nin2+l 



< 2 



pnin2 

E 



2a. - 



JJS_ 

Mk.l 



{VTA(k,i).K,) 



< 



-i— p(l — s)nin2 + l 

4c24P5 max (//iCg, iU4) ^ log (nin2) 
nin2 



2 2 

-2^^|(7'TA(,,i),Ke)| 



(94) 



with high probability. 

We still need to bound the deviation of 5' e from sgn [S o). Observe that the difference between them 
arise from sampling with replacement, i.e. there are a few entries in {ai \ p{l~ s) 71177,2 < i < pnin2\ that 
either fall within 51'^'°''" or have appeared more than once. A simple Chernoff bound argument (e.g. [33]) 
indicates the number of aforementioned conflicts is upper bounded by 10 log (771772) with high probability. 
That said, one can find a collection of entry locations {bi, • • • , bj^} such that 



N 



5 c - Sgn (S c) = E ■^''> V^^b, , 



(95) 



where N < 10 log (711772) with high probability. Therefore, we can bound 

N 

E 



N 

A^k,i),VT [S e - sgn(S e)))| < E -= KAm)'^t (v^Ab,))! 



< TV 



771772 
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and hence, for some constant C22 > 0, 



1 



A^kA),VT(s,~sgn{S,) 



2 /i^c^r^ log^ (nin2) ^ C2ops max (^ic^, /i4) r log (711^2) 
< C22 ^^^ < 



ntn 



i"-2 



nin2 



holds with high probability, provided that psnin2 > C23/iiCs?' log (riin2) for some C23 > 0. 
Putting the above results together yields that for every (fc, Z) e [ni] x [712], 



1 9 / / 

|(A(fe,,),7'T(sgn(Se)))|'< (A(fc,,),7'T(^c-sgn(5c) 

C25ps max {^ilCs, ^4) ?" log (nirt2) 



2 2 



A{k,l),'PT [S c 



< 



nin2 



for some constant C25 > 0, which completes the proof. 



D 



I Proof of Lemma [TO 



Consider the model of sgn(S'), K and Sc as introduced in the proof of Lemma 111 I in Appendix iHl For any 

(a,/3) e [ni] X [712], define 



With this notation, we can see that 2a. 's are conditionally independent given K, and satisfy 

1 

\/l^a,BAaRKaj3 — — 
(Q,,3)e[ni]x[n2] 



E 



(Za. I K] = V" y'OJa^pAa,/^ 

V / niJl2 , „, ^, ^ , ^ 

' l]x[n2] 



-K, 



n\n2 



-'a, 13 



and 



E 



(^a.^:^ 



K 



< 



711772 



/ ^ ||Wq,/3j4q^/3j4^ 



1 :=V. 



(a,/3)e[ni]x[n2] 



Since S'e = '^i2}i-s)nn n +1 -^li ' ^Pplyi^ig the Bernstein inequality |28| Theorem 6] implies that for any 
t < 2ps7ii772, one has, conditioned on K, that 



Sc - psKc 



> t] < 2771772 exp 



i2 



4/9S7ll772T^ 

Therefore, conditioned on K, there exists a constant C22 > such that 



Sc — psKc 



< v/c22ps?^ln-2 log (771772) 



(96) 



with probability at least than 1 — 77i ""772 . 

The next step is to bound the operator norm of psKc- For convenience of analysis, we write 

Kc= ^ Zay/lJ^Aa, 

o£[ni]x [ri2] 

for a collection of independent (not necessarily i.i.d.) random variables z^s satisfying \za\ — 1 and Ez^ — 0. 

Let ya := Za^/io^Aa, then we have EJ^q, = 0, ||3^o|| = 1, and 



E ^y-y*^ 



ie[rii]x[n2] 



E UJa.Aa.Al 

ae[ni]x[n2] 



< 77l772 := V. 
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Therefore, applying the Bernstein inequahty yields that for any t < nin2, 



p{||js:,|l>0 = : 



ae[rii]x[n2] 



> t > < 2nin2 exp 



4nin2 



or, more simply, there exists a constant C22 > such that 



||-K"e|| < \/c22nin2log{nin2 
with high probability. This and (|M)) . taken collectively, yield 



< 



Se - psK^ 



ps \\Kc\\ < 2y/c22psnin2 log (711^2) 



with high probability. On the other hand, (|95|l implies that for sufficiently large ni and 712, 



N 



S e- sgn(S'o) < E llV^^bill = ^ < 101og(riin2) < \/c22psnin2 log (nin2) 



with high probability. Consequently, for a sufficiently small constant s, 

||7'T^(Asgn(5c))|| < A||sgn(5c)|| < A ||5c - sgn(5e 

< 2X^yc22psnln2 log (nin2) 
f 



+ A 



= 2y/C22S < 



with probability exceeding 1 — rii n2 ■ 



J Proof of Theorem [2] 



We prove this theorem under the conditions of Lemma |31 i.e. (|M|) - (P7)) . Note that these conditions are 
satisfied with high probability, as we have shown in the proof of Theorem [TJ 

Denote the solution of Noisy- EMaC as Xc = Xc + He- Since He is a two- fold Hankel matrix, i.e. 
-ffe = -^n (He) + Afi± (Hc); we can obtain 



\\X,\\, > llXelU = llXe + ifell* > \\X , + An± {H ,)\\, - \\An{H,) 
The second term can be bounded using the triangle inequality as 



\\AniH,)\\^< ||^n( 



'X,-X° 



\An (Xe - X°) 



(97) 



(98) 



rn{x-x° 



< S and \\Vn {X - X°)\\p < 5, the Hankel 



An I Xc — X° 



< y/nrn^5 and ||A^ (^o - Xo)|If < 



Since the constraint of Noisy-EMaC requires 

structure of the enhanced form allows us to bound 

^7iin2<5, which immediately leads to 

\\An{Hc)\\p<2^K^S. 

Using the same analysis as for (p^ allows us to bound the perturbation Aq± {He) as follows 

\\Xe + An^{He)l^ > llXell, + i ||7'T-^a-(^c)||F • 
Combining this with ^7} . we have 

\\VT±An±{He)\\p < 4\\An{Hc)\\* < Ay^E^\\An{He)\\F < 8nin2<5. 



41 



Further from Lemma [31 we know that 

WVTAn^ (fl-e)llp < '^V2\\VT^An^ {H,)\\^. (99) 

Therefore, combining all the above results give 

\\H4f < \\An{H,)\\F + WVrAn^ {H,)\\p + WVt^A^^ {H,)\\p 



8V2nl 



2^2 



I "^ J 

K Proof of Theorem [H 

In order to extend the results to structured Hankel matrix completion, from the proof of Theorem [T] it is 
sufficient to have the first two conditions in psp to hold for general Hankel matrices. The proof is done by 
recognizing these two conditions are equivalent to (|26p . 
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